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1 1 * JULI 2000 

IMPROVED FOLLICLE STIMULATING HORMONE PVS 
FIELD OF THE INVENTION 

The present invention relates to new polypeptides, to new polypeptide conjugates exhibiting 
follicle stimulating hormone (FSH) activity, to methods for preparing such polypeptides or con- 
jugates, and to the use of such polypeptides or conjugates in therapy, in particular in the treat- 
ment of infertility. 

BACKGROUND OF THE INVENTION 

Follicle Stimulating Hormone (FSH) is a dimeric hormone consisting of an a subunit and a p 
subunit. The a subunit is common to the glycoprotein hormone family, which apart from FSH 
includes chorionic gonadotropin (GC), thyroid stimulating hormone (TSH), and luteinizing 
hormone (LH), whereas the p subunit is specific to FSH. The human wildtype a subunit is a 92 
amino acid glycoprotein, the amino acid sequence of which is shown in SEQ ID NO 2. Said 
subunit is referred to herein as hFSH-a. The human wildtype P subunit is a 1 1 1 amino acid gly- 
coprotein that has the amino acid shown in SEQ ID NO 4. This subunit is referred to herein as 
hFSH-p. hFSH-a comprises 5 cystines formed by the cysteines located in positions 7 and 31, 
10 and 60, 28 and 82, 59 and 87, and 32 and 84, respectively. hFSH-P comprises 12 cysteines 
corresponding to 6 cystines located in positions 3 and 51, 17 and 66, 20 and 104, 28 and 82, 32 
and 84, and 87 and 94, respectively. 

Human FSH (hFSH) has been isolated from pituitary glands and from post-menopausal urine 
(EP 322 438) and has been produced recombinantly in mammalian cells (US 5,639,640, US 
5,156,957, US 4,923,805, US 4,840,896, EP 21 1,894 and EP 521,586). The latter references 
also disclose the hFSH-P gene. US 5,405,945 discloses a modified human a subunit gene com- 
prising only one intron. 

US 4,589,402 and US 4,845,077 disclose purified hFSH which is free of LH and the use thereof 
for in vitro fertilization. EP 322 438 discloses a protein with at least 6200 U/mg FSH activity 
which is substantially free of LH activity, and wherein the FSH a subunit and P subunit, respec- 
tively, may be wildtype or specified truncated forms thereof. 

Liu et al., J Biol Chem 1993, 15;268(2):21613-7, Grossmann et al., Mol Endocrinol 1996 10(6): 
769-79, Roth and Dias (Mol Cell Endocribol 1995 1; 109(2): 143-9, Valove et al., Endochrinol- 
ogy 1994; 135(6):2657-61 ,Yoo et al., J Biol Cheml993 25; 268(18): 13034-42), US 5,508,261 
and Chappel et al., 1998, Human Reproduction, 13(3): 18-35 disclose various structure-function 
relationship studies and identify amino acid residues involved in receptor binding and activation 
and in dimerization of FSH. 

It has been found that glycosylation of FSH-a and FSH-P is essential for receptor signal trans- 
duction. hFSH-a comprises two N-glycosylation sites at the asparagines located at position 52 
and 78, whereas hFSH-P comprises two N-glycosylation sites at the asparagines located at posi- 
tions 7 and 24. The importance of the various N-glycosylation sites for the binding and signal- 
transducing activities of FSH are discussed, inter alia, by Valove et al., Endochrinology 1994; 
135(6):2657-61 and Flack et al., J Biol Chem 1994 13;269( 19): 140 15-20. 

Galway et al., Endocrinology 1990; 127(1):93-100 demonstrate that FSH variants produced in a 
N-acetylglucosamine transferase-I CHO cell line or a CHO cell line defective in sialic acid 
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transport are as active as FSH secreted by wildtype cells or purified pituitary FSH in vitro, but 
lacked in vivo activity, possibly due to rapid clearance of the inadequately glycosylated variants 
in serum. D'Antonio et al., Human Reprod 1999; 14(5):1 160-7 describe various FSH isoforms 
circulating in the blood stream. The isoforms have identical amino acid sequences, but differ in 

5 their extent of post-translational modification. It was found that the less acidic isoform group 
had a faster in vivo clearance as compared with the acidic isoform group, possibly due to differ- 
ences in the sialic acid content between the isoforms. No significant difference in in vitro activ- 
ity was observed between the isoforms. A similar result has been reported in US 5,087,61 5 and, 
for CHO produced recombinant FSH isoforms, by de Leeuw et al., Mol Hum Reprod 1996; 

10 2(5):361-9. 

US 5,087,615 discloses a method for stimulating follicle development and ovulation in a female 
patient by administering FSH to said patient during the follicular phase of the ovulatory cycle, 
the improvement comprising initially adminstering a first FSH isoform having a relatively long 
15 plasma half-life and subsequently administering a second FSH isoform having a shorter plasma 
half-life. 

Bishop et al. Endochrinology 1995; 136(6):2635-40 conclude that circulatory half-life appears 
to be the primary determinant of in vivo activity. 

20 

Attempts have been made to prolong the serum half-life of FSH. US 5,338,835 and US 
5,585,345 disclose a modified FSH-P subunit extended at the C-terminal Glu with the carboxy 
terminal portion (CTP) region of hCG (the entire region consisting of the amino acid sequence 
which occurs between positions 1 12-1 18 and 145, inclusive and comprising four O-linked gly- 
25 cosylation sites located at positions 1 2 1 , 1 27, 1 32 and 1 38). The resulting modified subunit is 
stated to have the biological activity of native FSH, but a prolonged circulating half-life. US 
5,405,945 discloses that the carboxy terminal portion of the CG p subunit or a variant thereof 
has significant effects on the clearance of GC, FSH, and LH. 

30 US 5,883,073 discloses single-chain proteins comprised of two a-subunits with agonist or an- 
tagonist activity for CG, TSH, LH and FSH. The a subunits may be the human wildtype or a 
variant thereof, e.g. incorporating part of or the entire CTP region of hCG. Furthermore, the a 
subunit may be a variant in which amino acid residues between positions 50 and 60 are substi- 
tuted, especially in positions 51, 53 and 55, or wherein Lys91 is converted to methionine or 

35 glutamic acid. The single-chain proteins can be combined with an appropriate p subunit. 

US 5,508,261 discloses heterodimeric polypeptides having binding affinity to LH and FSH re- 
ceptors comprising a glycoprotein hormone a subunit and a non-naturally occurring P subunit 
polypeptide, wherein the p subunit polypeptide is a chain of amino acids comprising four joined 
40 subsequences, each of which is selected from a list of specific sequences. 

US 5,567,422 and WO 98/32466 suggest that FSH, among a vast number of other therapeutic 
proteins, may be PEGylated. 

45 Currently, FSH is used therapeutically to stimulate the growth and maturation of ovarian folli- 
cles in infertile women. In particular, FSH is used in connection with in vitro fertilization as 
well as for the treatment of anovulatory women, with anovulatory syndrome or luteal phase 
deficiency. However, one problem encountered in current FSH treatment is the short in vivo 
half-life of FSH requiring frequent, usually daily administration of the product. The frequent 

50 administration is very inconvenient for the patient and results in high fluctuations of FSH activ- 



ity in the blood stream, which is undesirable, and may cause inadequate maturation of the folli- 
cles. 

Therefore, a clinical need exists for a product which provides part or all of the therapeutically 
relevant effects of FSH, and which may be administered at less frequent intervals as compared 
to currently available FSH product, and which preferably provides a more stable level of circu- 
lating FSH activity as compared to that obtainable by current treatment. The present invention 
is directed to such products as well as the means of making such products. 



BRIEF DISCLOSURE OF THE INVENTION 

More specifically, the present invention relates to polypeptide conjugates exhibiting FSH activ- 
ity and methods for their preparation and their use in medical treatment. 

Accordingly, in its first aspect the invention relates to a conjugate exhibiting FSH activity, com- 
prising 

i) a polypeptide comprising FSH-a and FSH-P subunits, wherein at least one of said FSH-a and 
FSH-P subunits differs from the corresponding wildtype subunit in that at least one amino acid 
residue acid residue comprising an attachment group for a non-polypeptide moiety has been 
introduced or removed, and 

ii) a non-polypeptide moiety bound to an attachment group of said polypeptide. 

In a further aspect the invention relates to a polypeptide conjugate exhibiting FSH activity, 
comprising 

i) a polypeptide comprising FSH-a and FSH-P subunits, wherein the amino acid sequence of at 
least one of said FSH-a and FSH-P subunits differs from that of the corresponding wildtype 
subunit in that at least one N-glycosylation site has been introduced, and 

ii) an oligosaccharide moiety bound to an N-glycosylation site of said polypeptide. 

In the above aspects the corresponding respective wildtype subunits are preferably hFSH-a and 
hFSH-p. 

Another aspect of the invention relates to a polypeptide conjugate exhibiting FSH activity, 
comprising a polypeptide comprising FSH-a and FSH-P subunits, wherein at least one of said 
FSH-a and FSH-p subunits comprises at least one introduced N- or O-glycosylation site at the 
N-terminal thereof, said at least one introduced glycosylation site being glycosylated. 

In a further aspect, the invention relates to a polypeptide conjugate exhibiting FSH activity, 
comprising a polypeptide comprising FSH-a and FSH-p subunits, wherein at least one of said 
FSH-a and FSH-P subunits comprises a polymer molecule bound to the N-terminal thereof. 

In a still further aspect the invention relates to a substantially homogenous preparation of a con- 
jugate of the invention. 

In a further aspect the invention relates to generally novel modified FSH-a and modified FSH-P 
polypeptides. The polypeptides of the invention are contemplated to be useful as such for thera- 
peutic, diagnostic or other purposes, but find particular interest as intermediate products for the 
preparation of a conjugate of the invention. 
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In still farther aspects the invention relates to means and methods for preparing a conjugate or a 
polypeptide of the invention, including nucleotide sequences and expression vectors encoding a 
polypeptide or a conjugate of the invention. 

5 In final aspects the invention relates to a therapeutic composition comprising a conjugate, poly- 
peptide or preparation of the invention and methods of treating a mammal with such composi- 
tion. In particular, the polypeptide, conjugate or composition of the invention may be used to 
treat infertility. 

I o DETAILED DISCLOSURE OF THE INVENTION 

Definitions 

In the context of the present application and invention the following definitions apply: 

15 The term "conjugate" is intended to indicate a heterogeneous molecule formed by the 
covalent attachment of one or more polypeptides to one or more non-polypeptide moieties 
such as polymer molecules, lipophilic compounds, carbohydrate moieties or organic 
derivatizing agents. The term covalent attachment means that the polypeptide and the non- 
polypeptide moiety are either directly covalently joined to one another, or else are indirectly 

20 covalently joined to one another through an intervening moiety or moieties, such as a bridge, 
spacer, or linkage moiety or moieties. Preferably, the conjugate is soluble at relevant 
concentrations and conditions, i.e. soluble in physiological fluids such as blood. The term 
"non-conjugated polypeptide" may be used about the polypeptide part of the conjugate. 

25 The "polymer molecule" is a molecule formed by covalent linkage of two or more 
monomers, wherein none of the monomers is an amino acid residue, except where the 
polymer is human albumin or another abundant plasma protein. The term "polymer" may be 
used interchangeably with the term "polymer molecule". The term is intended to cover 
carbohydrate molecules attached by in vitro glycosylation. Carbohydrate molecules attached 

30 by in vivo glycolsylation, such as N- or O-glycosylation (as further described below) are 
referred to herein as "an oligosaccharide moiety". Except where the number of polymer 
molecules is expressly indicated, every reference to "a polymer", "a polymer molecule", 
"the polymer" or "the polymer molecule" contained in polypeptide of the invention or oth- 
erwise used in the present invention shall be a reference to one or more polymer molecule(s). 

35 

The term "attachment group" is intended to indicate a functional group of the polypeptide, in 
particular of an amino acid residue thereof or an oligosaccharide moiety, capable of attaching 
a non-peptide moiety such as a polymer molecule, a lipophilic molecule or an organic 
derivatizing agent. Useful attachment groups and their matching non-peptide moieties are 
40 apparent from the table below. 
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Attachment 
group 


Amino acid 


Examples of non- 
peptide moiety 


Conjugation 
method/- 
Activated PEG 


Reference 


-NH 2 


N-terminal, 
Lys 


Polymer, e.g. 
PEG, with amide 
or imine group 


mPEG-SPA 
Tresylated mPEG 


Shearwater Inc. 
Delgado et al, 
critical reviews in 
Therapeutic Drug 
Carrier Systems 
9(3,4):249-304 
(1992) 


-COOH 


C-term, 
Asp, Glu 


Polymer, e.g. 
PEG, with ester or 
amide group 

Oligosaccharide 
moiety 


mPEG-Hz 

In vitro coupling 


Shearwater Inc. 


-SH 


Cys 


Polymer, e.g. 
PEG, with 
disulfide, 

maleimide or vinyl 
sulfone group 

Oligosaccharide 
moiety 


PEG- 

vinylsulphone 
PEG-maleimide 

In vitro coupling 


Shearwater Inc. 
Delgado et al, 
critical reviews in 
Therapeutic Drug 
Carrier Systems 
9(3,4):249-304 
(1992) 


-OH 


Ser, Thr, 
OH-, Lys 


Oligosaccharide 
moiety 

PEG with ester, 
ether, carbamate, 
carbonate 


In vivo O-linked 
glycosylation 




-CONH 2 


Asn as part 
of an N- 
glycosylatio 
n site 


Oligosaccharide 
moiety 

Polymer, e.g. PEG 


In vivo N- 
glycosylation 




Aromatic 
residue 


Phe, Tyr, 
Trp 


Oligosaccharide 
moiety 


In vitro coupling 




-CONH 2 


Gin 


Oligosaccharide 
moiety 


In vitro coupling 


Yan and Wold, 
Biochemistry, 1984, 
Jul 31; 23(16): 3759- 
65 


Aldehyde 
Ketone 


Oxidized 
oligo- 
saccharide 


Polymer, e.g. 
PEG, 

PEG-hydrazide 


PEGylation 


Andreszet al., 1978, 
Makromol. Chem. 
179:301, WO 
92/16555, WO 
00/23114 



Guanidino 


Arg 


Oligosaccharide 
moiety 


In vitro coupling 


Lundblad and Noyes, 
Chimical Reagents 
for Protein 
Modification, CRC 
Press Inc. Boca 
Raton, FI 


Imidazole 
ring 


His 


Oligosaccharide 
moiety 


In vitro coupling 


As for guanidine 



For in vivo N-glycosylation, the term "attachment group" is used in an unconventional way 
to indicate the amino acid residues constituting an N-glycosylation site (with the sequence N- 
X'-S/T/C-X", wherein X' is any amino acid residue except proline, X" any amino acid resi- 

5 due which may or may not be identical to X' and which preferably is different from proline, 
N is asparagine, and S/T/C is either serine, threonine or cysteine, preferably serine or 
threonine, and most preferably threonine). Although the asparagine residue of the N- 
glycosylation site is where the oligosaccharide moiety is attached during glycosylation, such 
attachment cannot be achieved unless the other amino acid residues of the N-glycosylation 

10 site are present. Accordingly, when the non-peptide moiety is an oligosaccharide moiety and 
the conjugation is to be achieved by N-glycosylation, the term "amino acid residue compris- 
ing an attachment group for the non-peptide moiety" as used in connection with alterations of 
the amino acid sequence of the polypeptide of interest is to be understood as meaning that one 
or more amino acid residues constituting an N-glycosylation site are to be altered in such a 

is manner that either a functional N-'glycosylation site is introduced into the amino acid se- 
quence or removed from said sequence. 

In the present application, amino acid names and atom names (e.g. CA, CB, NZ, N, O, C, 

20 etc.) are used as defined by the Protein DataBank (PDB), which is based on the IUPAC no- 
menclature (IUPAC Nomenclature and Symbolism for Amino Acids and Peptides (residue 
names, atom names etc.), Eur. J. Biochem., 138, 9-37 (1984) together with their corrections 
in Eur, J. Biochem., 152, 1 (1985). The term "amino acid residue" is intended to indicate an 
amino acid residue contained in the group consisting of alanine (Ala or A), cysteine (Cys or 

25 C), aspartic acid (Asp or D), glutamic acid (Glu or E), phenylalanine (Phe or F), glycine 
(Gly or G), histidine (His or H), isoleucine (lie or I), lysine (Lys or K), leucine (Leu or L), 
methionine (Met or M), asparagine (Asn or N), proline (Pro or P), glutamine (Gin or Q), 
arginine (Arg or R), serine (Ser or S), threonine (Thr or T), valine (Val or V), tryptophan 
(Trp or W), and tyrosine (Tyr or Y) residues. The terminology used for identifying amino 

30 acid positions/substitutions is illustrated as follows: E9(a) indicates position #9 occupied by a 
glutamic acid residue in the amino acid sequence shown in SEQ ID NO 2. E9(a)N indicates 
that said glutamic acid residue has been substituted by an asparagine residue. The numbering 
of amino acid residues made herein is made relative to the amino acid sequence shown in 
SEQ ID NO 2 (for FSH-oc) and SEQ ID NO 4 (for FSH-P). Multiple substitutions are indi- 

35 cated with a " + ", e.g. M109(b)N+Elll(b)S/T means an amino acid sequence which com- 
prises a substitution of the methionine residue in position 109 of FSH-P by an asparagine 
residue and a substitution of the glutamic acid residue in position 1 1 1 in FSH-P by a serine or 
a threonine residue. 

40 The term "nucleotide sequence" is intended to indicate a consecutive stretch of two or more 
nucleotide molecules. The nucleotide sequence may be of genomic, cDNA, RNA, semisyn- 
thetic, synthetic origin, or any combination thereof. 



The term "polymerase chain reaction" or "PCR" generally refers to a method for amplifica- 
tion of a desired nucleotide sequence in vitro, as described, for example, in US 4,683,195. In 
general, the PCR method involves repeated cycles of primer extension synthesis, using 
oligonucleotide primers capable of hybridising preferentially to a template nucleic acid. 

"Cell", "host cell", "cell line" and "cell culture" are used interchangeably herein and all 
such terms should be understood to include progeny resulting from growth or culturing of a 
cell. "Transformation" and "transfection" are used interchangeably to refer to the process of 
introducing DNA into a cell. 

"Operably linked" refers to the covalent joining of two or more nucleotide sequences, by 
means of enzymatic ligation or otherwise, in a configuration relative to one another such that 
the normal function of the sequences can be performed. For example, the nucleotide sequence 
encoding a presequence or secretory leader is operably linked to a nucleotide sequence for a 
polypeptide if it is expressed as a preprotein that participates in the secretion of the 
polypeptide: a promoter or enhancer is operably linked to a coding sequence if it affects the 
transcription of the sequence; a ribosome binding site is operably linked to a coding sequence 
if it is positioned so as to facilitate translation. Generally, "operably linked" means that the 
nucleotide sequences being linked are contiguous and, in the case of a secretory leader, 
contiguous and in reading phase. Linking is accomplished by ligation at convenient restriction 
sites. If such sites do not exist, then synthetic oligonucleotide adaptors or linkers are used, in 
conjunction with standard recombinant DNA methods. 

The term "introduce" is primarily intended to mean substitution of an existing amino acid 
residue, but may also mean insertion of an additional amino acid residue. The term "remove" 
is primarily intended to mean substitution of the amino acid residue to be removed by another 
amino acid residue, but may also mean deletion (without substitution) of the amino acid 
residue to be removed. 

The term "immunogenicity" as used in connection with a given substance is intended to 
indicate the ability of the substance to induce a response from the immune system. The 
immune response may be a cell or antibody mediated response (see, e.g., Roitt: Essential 
Immunology (8* Edition, Blackwell) for further definition of immunogenicity). Normally 
reduced antibody reactivity will be an indication of a reduced immunogenicity. 

The term "functional in vivo half-life" is used in its normal meaning, i.e. the time at which 
50% of the biological activity of the polypeptide or conjugate is still present in the 
body/target organ, or the time at which the activity of the polypeptide or conjugate is 50% of 
the initial value. As an alternative to determining functional in vivo half-life, "serum half- 
life" may be determined, i.e. the time at which 50% of the dispensed polypeptide or conju- 
gate molecules is still present in the circulation/plasma/bloodstream. Determination of serum 
half-life is often more simple than determining the functional in vivo half-life and the magni- 
tude of serum half-life is usually a good indication of the magnitude of functional in vivo half- 
life. Alternative terms to serum half-life include "plasma half-life", "circulating half-life", 
"serum clearance", "plasma clearance" and "clearance half-life". The polypeptide or conju- 
gate is cleared by the action of one or more of the kidney, reticuloendothelial systems (RES), 
spleen or liver, by FSH-receptor-mediated elimination, or by specific or non-specific prote- 
olysis. Normally, clearance depends on size (relative to the cutoff for glomerular filtration), 
charge, attached carbohydrate chains, and the presence of cellular receptors for the protein. 
The functionality to be retained is normally selected from proliferative or receptor binding 



activity. The functional in vivo half-life and the serum half-life may be determined by any 
suitable method known in the art as further discussed in the Materials and Methods section 
hereinafter. 



5 The term "increased" as used about the functional in vivo half-life or serum half-life is used 
to indicate that the relevant half-life of the conjugate or polypeptide is statistically signifi- 
cantly increased relative to that of a reference molecule, such as a non-conjugated rhFSH 
(recombinant hFSH), e.g. Gonal-F® (available from Serono) or Puregon® (available from 
Organon), as determined under comparable conditions. 

10 

The term "renal clearance" is used in its normal meaning to indicate any clearance taking 
place by the kidneys, e.g. by glomerular filtration, tubular excretion or degradation in the 
tubular cells. Renal clearance depends on physical characteristics of the conjugate, including 
size (diameter), symmetry, shape/rigidity and charge. A molecular weight of about 67 kDa is 

15 considered to be an important cut-off-value for renal clearance, i.e. a molecular weight above 
about 67 kDa normally results in reduced renal clearance. A reduced renal clearance may be 
confirmed by any suitable assay, e.g. an established in vivo assay. Typically, the renal clear- 
ance is determined by administering a labelled (e.g. radiolabeled or fluorescenctly labelled) 
polypeptide conjugate to a patient and measuring the label activity in urine collected from the 

20 patient during a specified time. The reduced renal clearance is determined relative to the cor- 
responding non-conjugated polypeptide or the non-conjugated corresponding wild-type poly- 
peptide under comparable conditions. 

The term "FSH-a" is intended to indicate a polypeptide having qualitatively similar functions 
25 or activities as the corresponding wildtype FSH a subunit, including the capability of forming 
a dimeric polypeptide with an FSH-P subunit (FSH-P), which dimeric polypeptide exhibits 
FSH activity. Alternatively used terms include "FSH-a polypeptide", "FSH-a subunit", and 
"modified FSH-a". Analogously, the term "FSH-P" is intended to indicate a polypeptide 
having qualitatively similar functions or activities as the corresponding wildtype FSH p 
30 subunit, including the capability of dimerizing with FSH-a and thereby forming a dimeric 
polypeptide exhibiting FSH activity. Alternatively used terms include "FSH-P polypeptide", 
"FSH-P subunit", and "modified FSH-p". 

The term "exhibiting FSH activity" is intended to indicate that the conjugate or polypeptide 
35 has one or more of the functions of wildtype FSH, in particular hFSH, including the capabil- 
ity of binding to and activating a FSH receptor. The FSH activity is conveniently assayed 
using the receptor binding assay described in the Materials and Methods section hereinafter. 
The conjugate or polypeptide "exhibiting" FSH activity is considered to have such activity 
when it displays a measurable function, e.g. a measurable activity. The dimeric polypeptide 
40 exhibiting FSH activity may also be termed " FSH molecule" herein. 

Conjugate of the invention 

As stated above, in a first aspect the invention relates to a polypeptide conjugate exhibiting 
FSH activity, comprising i) a polypeptide comprising FSH-a and FSH-P subunits, wherein at 
45 least one of the FSH-a and FSH-p subunits differs from the corresponding wildtype subunit 
in at least one introduced or removed amino acid residue comprising an attachment group for 
non-polypeptide moiety, and ii) a non-polypeptide moiety bound to an attachment group of 
the polypeptide. Examples of amino acid residues that may be introduced and/or removed are 
described in further detail in the following sections. 
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The conjugate of the invention is the result of a generally new strategy for developing im- 
proved molecules with FSH activity. More specifically, by removing and/or introducing an 
amino acid residue comprising an attachment group for the non-polypeptide moiety it is pos- 
sible to specifically adapt the polypeptide so as to make the molecule more susceptible to con- 

5 jugation to the non-polypeptide moiety of choice, to optimize the conjugation pattern (e.g. to 
ensure an optimal distribution of non-polypeptide moieties on the surface of the FSH mole- 
cule and to ensure that only the attachment groups intended to be conjugated are present in 
the molecule) and thereby obtain a new conjugate molecule which has FSH activity and in 
addition one or more improved properties as compared to FSH molecules available today, in 

10 particular increased functional in vivo half-life and/or reduced renal clearance. 

In the conjugate of the invention, one or both of the FSH subunits may be modified according 
to the invention. For instance, the amino acid sequence of FSH-a may be modified as de- 
scribed herein, whereas FSH-P is unmodified, and vice versa. Alternatively, both of FSH-a 
15 and FSH-(3 may be modified according to the invention. 

While the FSH-a and/or FSH-P may be of any origin, in particular mammalian origin, it is 
presently preferred that they are of of human origin. Accordingly, the corresponding 
wildtype subunits referred to above are preferably hFSH-a and hFSH-p\ respectively, with 
20 the amino acid sequences shown in SEQ ID NO 2 and 4, respectively. 

In a preferred embodiment one difference between the amino acid sequence of FSH-a and/or 
FSH-P and the corresponding wildtype sequence is that at least one and preferably more, e.g. 
1-15, amino acid residues comprising an attachment group for the non-polypeptide moiety ii) 

25 have been introduced, preferably by substitution, into the amino acid sequence(s). Thereby, 
for instance, shielding by non-polypeptide moieties may be achieved in different regions of 
the polypeptide molecule, leading to a lower immune response, and/or the molecular weight, 
shape, size and/or charge of the conjugate can be optimised. Preferably, such amino acid 
residues are introduced in positions occupied by an amino acid residue having more than 

30 25%, such as more than 50% or even more than 75% of its side chain exposed at the surface 
of the molecule. 

The term "one difference" as used in the present application is intended to allow for addi- 
tional differences being present. Accordingly, in addition to the specified amino acid differ- 
35 ence, other amino acid residues than those specified may be mutated. 

In a further preferred embodiment one difference between the amino acid sequence of FSH-a 
and/or FSH-p and that of the corresponding wildtype polypeptide is that at least one and 
preferably more, e.g. 1-15, amino acid residues comprising an attachment group for the non- 
40 polypeptide moiety ii) have been removed, preferably by substitution, from the amino acid 
sequence. The amino acid residue to be removed is preferably one to which conjugation is 
disadvantageous, e.g. an amino acid residue located at or near a functional site of the poly- 
peptide (since conjugation at such a site may result in inactivation or reduced FSH activity of 
the resulting conjugate due to impaired receptor recognition). In the present context the term 
45 "functional site" is intended to indicate one or more amino acid residues which are essential 
for or otherwise involved in the function or performance of hFSH, in particular dimerization 
and/or receptor binding and activation. Such amino acid residues are a part of a functional 
site. The functional site may be determined by methods known in the art and is preferably 
identified by analysis of a structure of the polypeptide complexed to a relevant receptor, such 
50 as the hFSH receptor. 
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In preferred embodiments of the present invention more than one amino acid residue of the 
FSH-a and/or FSH-p is altered, e.g. the alteration embraces removal as well as introduction 
of amino acid residues comprising an attachment group for the non-polypeptide moiety of 
choice. 

Typically, in order to avoid too much disruption of the structure and function of the FSH 
molecule the total number of amino acid residues to be altered in accordance with the present 
invention does not exceed 15. Preferably, the polypeptide part of the conjugate of the inven- 
tion or the polypeptide of the invention comprises an amino acid sequence which differs in 1- 
15 amino acid residues from the amino acid sequence shown in SEQ ID NO 2, such as in 1-8 
or 2-8 amino acid residues, e.g. in 1-5 or 2-5 amino acid residue from the amino acid se- 
quence shown in SEQ ID NO 2. Thus, normally the polypeptide part of the conjugate or the 
polypeptide of the invention comprises an amino acid sequence which differs from the amino 
acid sequence shown in SEQ ID NO 2 in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 
amino acid residues. 

The FSH-a and/or FSH-p of the polypeptide i) is preferably any of the specific modified 
FSH-a and/or FSH-P polypeptides disclosed in the subsequent sections having introduced 
and/or removed amino acid residues comprising an attachment group for the relevant non- 
polypeptide moiety. 

The amino acid residue comprising an attachment group for a non-polypeptide moiety, 
whether it is removed or introduced, is selected on the basis of the nature of the non- 
polypeptide moiety of choice and, in most instances, on the basis of the method in which con- 
jugation between the polypeptide i) and the non-polypeptide moeity ii) is to be achieved. It 
will be understood that in order to preserve a measurable function of the modified FSH-a 
and/or FSH-p, amino acid residues to be modified (by deletion, preferably by substitution) 
are selected from those amino acid residues which are not essential for providing a measur- 
able activity. Accordingly, amino acid residues to be modified are different from those re- 
quired for subunit dimerization and/or receptor binding or activation. The identity of such 
amino acid residues is described in the prior art (a representative part of which is identified in 
the Background section above) or can be determined by a person skilled in the art using 
methods known in the art. 

In addition to the removal and/or introduction of amino acid residues the FSH-a and/or FSH- 
P may comprise other amino acid changes, such as substitutions, or glycosylations which are 
not related to introduction and/or removal of amino acid residues comprising an attachment 
group for the non-polypeptide moiety. Examples of such additional amino acid changes in- 
clude adding part of or the entire CTP region of hGC to the C-terminus of FSH-a or 
introducing any other mutation (in particular selected among those reported to enhance FSH 
activity and/or increase the functional in vivo half-life, cf . the Background of the Invention 
section herein.) 

Preferably, the conjugate of the present invention has one or more improved properties as 
compared to hFSH, including increased functional in vivo half-life, increased serum half-life, 
reduced renal clearance, reduced immunogenicity and/or an increased bioavailability as com- 
pared to rhFSH (e.g. Gonal-F® or Puregon®). Consequently, medical treatment with a con- 
jugate of the invention offers a number of advantages over the currently available FSH com- 
pounds, including longer duration between injections and fewer side effects. 
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Normally, the increased functional in vivo half-life is obtained as a consequence of the conju- 
gate having a reduced susceptibility to renal clearance as compared to hFSH. The reduced 
susceptibility to renal clearance is obtained as a consequence of the size, shape/rigidity, net 

5 charge and other characteristics of the conjugate being changed as compared to the unconju- 
gated polypeptide. In a preferred embodiment, the conjugate according to the invention has a 
molecular weight of at least about 67 kDa, preferably at least about 70 kDa, although a lower 
molecular weight may also give rise to a reduced renal clearance. In some cases, it will be 
preferred to obtain a slightly reduced renal clearance, e.g. to increase the in vivo half-life 

10 from about 24 hours to about 3-4 days, but to avoid a longer half-life of e.g. about a week. In 
such cases, the conjugate of the invention may have a molecular weight that is substantially 
below about 67 kDa, but which nevertheless has been increased a sufficient amount so as to 
ensure a desired reduction in renal clearance. Polymer molecules, such as PEG, have been 
found to be particularly useful for adjusting the molecular weight of the conjugate. As will be 

is explained in further detail below, the number and size of such polymer molecules may be 
adapted in order to obtain a desired renal clearance, as well as other desired properties, suit- 
able for a given clinical indication. 

In a preferred embodiment, the conjugate of the invention has a reduced renal clearance of at 
20 least about 50%, such as least about 75% or at least about 90%, as compared to the corre- 
sponding non-conjugated polypeptide (such as hFSH or rhFSH) as determined under compa- 
rable conditions. 

Conjugate of the invention wherein the non-polypeptide moiety is attached to a lysine or the 
25 N-terminal amino acid residue 

In a preferred embodiment the conjugate of the invention is one wherein the amino acid resi- 
due comprising an attachment group for the non-polypeptide moiety is a lysine residue and 
the non-polypeptide moiety ii) is any molecule which has lysine as an attachment group. For 
30 instance, the non-polypeptide moiety may be a polymer molecule, in particular any of the 
molecules mentioned in the section entitled "Conjugation to a polymer molecule", and pref- 
erably selected from the group consisting of linear or branched polyethylene glycol and 
polyalkylene oxide. Most preferably, the polymer molecule is mPEG-SPA or oxycarbonyl- 
oxy-N-dicarboxyimide PEG (US 5,122,614). 

35 

The FSH-a and/or FSH-p having introduced and/or removed at least one lysine may advan- 
tageously be in vivo glycosylated, e.g. using naturally occurring glycosylation sites present in 
the relevant FSH polypeptide. However, in a particular embodiment the conjugate is one 
wherein the amino acid sequence of FSH-a and/or FSH p differs from that of FSH-a and/or 
40 FSH-P in that an N-glycosylation site has been introduced and/or removed. Such intro- 
duced/removed sites may be any of those described in the section entitled "Conjugate of the 
invention wherein the non-polypeptide moiety is an oligosaccharide moiety". 

i) Removal of lysine residues 

45 hFSH-a contains 6 lysine residues and hFSH-p 7. In order to avoid conjugation to one or 
more of these lysine residues, e.g. lysine residues located at or close to the receptor-binding 
site of hFSH, it may be desirable to remove at least one lysine residue. Accordingly, in one 
embodiment the conjugate of the invention is one which comprises a modified FSH-a having 
an amino acid residue which differs from that of hFSH-a in the removal of at least one lysine 

so residue selected from the group consisting of K44(a), K45(a), K51(a), K63(a), K75(a), and 
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K91(a), in particular at least one amino acid residue selected from of the group consisting of 
K44(a), K45(a), K63(a), K75(a), and K91(a) (these residues having more than 25% of their 
side chain exposed to the surface), and preferably from the group consisting of K45(a), 
K63(a), K75(a), and K91(a) (these residues having more than 50% of their side chain ex- 
posed to the surface). The FSH-P part of this conjugate may be hFSH-P or any of the modi- 
fied FSH-P polypeptides described herein. 

In another embodiment the conjugate of the invention is one which comprises a modified 
FSH-p having an amino acid residue which differs from that of hFSH-p in the removal of at 
least one lysine residue selected from the group consisting of K 14(b), K40(b), K46(b), 
K49(b), K54(b), K86(b), and Kl 10(b), in particular at least one amino acid residue selected 
from of the group consisting of K14(b), K40(b), K46(b), K49(b), K54(b), K86(b), and 
Kl 10(b) (these residues having more than 25% of their side chain exposed to the surface), 
and preferably from the group consisting of K46(b), K54(b), K86(b), and Kl 10(b) (these 
residues having more than 50% of their side chain exposed to the surface). The FSH-a part 
of this conjugate may be hFSH-a or any of the modified FSH-a polypeptides described 
herein. 

In a further embodiment, the conjugate of the invention is one which comprises a modified 
FSH-a and a modified FSH-p, each of which differ from the corresponding hFSH subunit in 
the removal of at least one of the above identified lysine residues. For instance, the conjugate 
of the invention may be one wherein the modified FSH-a and modified FSH-P subunit differ 
from the corresponding hFSH subunit in at least one of K45(a), K63(a), K75(a), and K91(a) 
and at least one of K46(b), K54(b), K86(b), and Kl 10(b). 

The removal of any of the above lysine residues is preferably achieved by substitution by any 
other amino acid residue, in particular by an arginine or a glutamine residue. 

ii) Introduction of lysine residues 

In order to obtain a more extensive conjugation it may be desirable to introduce at least one 
non-naturally occurring lysine residue in hFSH, in particular in a position occupied by an 
amino acid residue having a side chain which is more than 25% surface exposed and which is 
not part of a cystine or located at a receptor binding site. Such amino acid residues are identi- 
fied in the Examples section hereinafter or form part of the state of the art. 

Accordingly, in a further embodiment the conjugate of the invention is one which comprises 
a modifed FSH-a having an amino acid residue which differs from that of hFSH-a in the 
introduction of at least one lysine residue in a position selected from the group consisting of 
Al(a), P2(a), D3(a), V4(a), Q5(a), D6(a), P8(a), E9(a), Tll(a), L12(a), Q13(a), E14(a), 
P16(a), F17(a), Q20(a), P21(a), G22(a), A23(a), P24(a), L26(a), M29(a), F33(a), R42(a), 
S43(a), T46(a), L48(a), V49(a), Q50(a), N52(a), V61(a), S64(a), Y65(a), N66(a), R67(a), 
V68(a), T69(a), M71(a), G72(a), G73(a), F74(a), N78(a), T80(a), A81(a), H83(a), S85(a), 
T86(a), Y88(a), Y89(a), H90(a), and S92(a), in particular selected from of the group consist- 
ing of Al(a), P2(a), D3(a), V4(a), Q5(a), D6(a), P8(a), E9(a), Tll(a), Q13(a), E14(a), 
P16(a), F17(a), Q20(a), P21(a), G22(a), A23(a), T46(a), L48(a), V49(a), Q50(a), N52(a), 
S64(a), N66(a), R67(a), T69(a), G72(a), G73(a), T86(a), Y89(a), H90(a), and S92(a) (these 
residues having more than 50% of their side chain exposed to the surface), and most prefera- 
bly in the position R42(a) and/or R67(a), such as R67(a). The FSH-P part of this conjugate 
may be hFSH-P or any of the modified FSH-P polypeptides described herein. 
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In a further embodiment the conjugate of the invention is one which comprises a modifed 
FSH-(3 having an amino acid residue which differs from that of hFSH-P in the introduction of 
at least one lysine residue in a position selected from the group consisting of Nl(b), S2(b), 
E4(b), L5(b), T6(b), N7(b), 18(b), T9(b), E15(b), E16(b), R18(b), F19(b), 121(b), S22(b), 
N24(b), Y31(b), Y33(b), R35(b), D36(b), L37(b), Y39(b), D41(b), P42(b), A43(b), R44(b), 
P45(b), 147(b), E55(b), L56(b), V57(b), Y58(b), E59(b), T60(b), V61(b), R62(b), P64(b), 
G65(b), A67(b), H68(b), H69(b), D71(b), L73(b), Y74(b), T75(b), T80(b), Q8l(b), H83(b), 
G85(b), D88(b), S89(b), D90(b), S91(b), D93(b), T95(b), V96(b), R97(b), G98(b), L99(b), 
G100(b), Y103(b), S105(b), F106(b), G107(b), E108(b), M109(b), and El 11(b), in particu- 
lar selected from of the group consisting of Nl(b), N7(b), T9(b), E15(b), El6(b), R18(b), 
F19(b), N24(b), Y33(b), D41(b), P42(b), A43(b), R44(b), P45(b), 147(b), E55(b), V57(b), 
Y58(b), E59(b), R62(b), P64(b), G65(b), A67(b), H68(b), H69(b), D71(b), L73(b), T75(b), 
Q81(b), H83(b), D88(b), S89(b), D90(b), S91(b), T95(b), R97(b), G98(b), L99(b), G100(b), 
Y103(b), S105(b), F106(b), G107(b), E108(b), M109(b), and El 11(b) (these residues having 
more than 50% of their side chain exposed to the surface), and most preferably selected from 
the group consisting of R18(b), R35(b), R44(b), R62(b), and R97(b), such R18(b), R44(b), 
R62(b), and R97(b). The FSH-a part of this conjugate may be hFSH-ct or any of the modi- 
fied FSH-ct polypeptides described herein. 

In a further embodiment, the conjugate of the invention is one which comprises a modified 
FSH-a and a modified FSH-0, each of which differ from the corresponding hFSH subunit in 
the introduction of a lysine residue in at least one of the above identified positions. For in- 
stance, the conjugate of the invention may be one wherein the modified FSH-a and modified 
FSH-P subunit differ from the corresponding hFSH subunit in that a lysine residue has been 
introduced in at least one of R42(a) and R67(a), and at least one of R18(b), R35(b), R44(b), 
R62(b), and R97(b), and more preferably in R67(a), and at least one of R18(b), R44(b), 
R62(b), R97(b). 

The introduction of a lysine residue is preferably achieved by substitution of any of the above 
amino acid residues. 

Hi) Introduction and removal of lysine residues 

In a preferred embodiment the conjugate of the invention comprises at least one introduced 
lysine residue, in particular any of those described in the section entitled "Introduction of 
lysine residues", and at least one removed lysine residue, in particular any of those described 
in the section entitled "Removal of lysine residues". 

Preferably, the conjugate comprises a modified FSH-a and/or a modified FSH-p which dif- 
fers from the corresponding hFSH-a/p in at least one introduced and at least one removed 
lysine residue, wherein the lysine residue is introduced by substitution of an amino acid resi- 
due selected from the group consisting of R42(a) and R67(a), R 18(b), R35(b), R44(b), 
R62(b), and R97(b), and more preferably from the group consisting of R67(a), R18(b), 
R44(b), R62(b), and R97(b) and removal of a lysine residue selected from the group consist- 
ing of K45(a), K63(a), K75(a), K91(a) K46(b), K54(b), K86(b), and Kl 10(b), the removal 
preferably being achieved by substitution by any other amino acid residue, in particular by an 
arginine residue. 

N-terminal PEGylation ofFSH 

As indicated above, one aspect of the invention relates to a polypeptide conjugate wherein at 
least one of the FSH-a and FSH-P subunits comprises a polymer molecule bound to the N- 
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terminal thereof. Preferably, the polymer is a polyethylene glycol (PEG) such as mPEG; see 
the general discussion below regarding conjugates comprising polyethylene glycol-derived 
polymers. 

5 In the case of N-terminal PEGylated FSH conjugates according to the invention, the respec- 
tive subunits may comprise one or more of the modifications disclosed elsewhere herein, or 
one or both of the subunits may be the respective wildtype subunits with a PEG-derived 
polymer being attached at the N-terminal. Thus, the polypeptide conjugate may be one in 
which the FSH-a subunit comprises hFSH-a having the sequence shown in SEQ ID NO 2, 

10 and/or in which the FSH-P subunit comprises hFSH-P having the sequence shown in SEQ ID 
NO 4. In a particular embodiment, both of the subunits correspond to the respective wildtype 
hFSH subunits, although with either the a or p subunit, or both, being N-terminally PEGy- 
lated. 

15 Aldehyde-activated PEG and reduction using NaBH 3 CN have been used to selectively pegy- 
late the N-terminal ot-amino group of proteins (see for instance US 5,824,784 regarding N- 
terminal PEGylation of G-CSF). The N-terminus of the a and/or the p chain of wildtype 
FSH or a modified form of FSH can be PEGylated using similar methods. Reaction materials 
include purified FSH or a modified form of FSH, methoxy-PEG-aldehyde (M-PEG-CHO), 

20 and NaBH 3 CN. In order to optimise yield, one may for instance vary: molar ratio of FSH, 
M-PEG-CHO and NaBH 3 CN, time for establishment of the Schiff's base equilibrium (reac- 
tion between FSH and M-PEG-CHO before addition of NaBH 3 CN), reaction time after addi- 
tion of NaBH 3 CN, temperature, pH, or reaction volume. The yield of PEGylated FSH forms 
may be analysed using Western blotting, mass spectrometry and N-terminal sequencing. In 

25 order to restrict PEGylation to only one of the two N-termini in FSH, PEGylation of the a or 
P chain may be selectively prevented by addition of a glutamine to the N-terminus. Spontane- 
ous cyclisation of such an N-terminal glutamine residue will render it unaccessible for PEG- 
ylation. Such a glutamine residue may subsequently be removed using a pyroglutamyl amino- 
peptidase (for instance EC 3.4.19.3). 

30 

Conjugate of the invention having a non-lysine residue as an attachment group 
Based on the present disclosure the skilled person will be aware that amino acid residues 
comprising other attachment groups may be introduced into and/or removed from FSH-a 
and/or FSH-P , using the same approach as that illustrated above by lysine residues. For in- 

35 stance, one or more amino acid residues comprising an acid group (glutamic acid and aspartic 
acid), asparagine, tyrosine and cysteine may be introduced into positions which in hFSH are 
occupied by amino acid residues having surface exposed side chains (i.e. the positions men- 
tioned above as being of interest for introduction of lysine residues), or removed (preferably 
by substitution by any other amino acid residue). Preferably, Asp is substituted by Asn, Glu 

40 by Gin, Tyr by Phe, and Cys by Ser. 

Conjugate of the invention wherein the non-polypeptide moiety is an oligosaccharide moiety 
It has been found that N-glycosylation is important for FSH activity and also that the extent 
and type of oligosaccharide moiety attached by in vivo glycosylation is important for func- 

45 tional in vivo half-life of the glycosylated FSH. In order to obtain a different, optionally in- 
creased glycosylation it is desirable to introduce at least one glycosylation site. Accordingly, 
in a further aspect the invention relates to polypeptide conjugate exhibiting FSH activity com- 
prising i) a polypeptide comprising FSH-a and FSH-P, wherein the amino acid sequence of 
said FSH-a and/or FSH-p differs from that of the corresponding wild type FSH, preferably 

so hFSH, in at least one introduced N-glycosylation site and ii) an oligosaccharide moiety. 
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A suitable N-glycosylation site may be introduced by introducing, preferably by substitution, 
an asparagine residue in a position occupied by an amino acid residue having more than 25% 
of its side chain exposed at the surface of the polypeptide, which position does not have a 
proline residue located in position +1 or +3 therefrom. If the amino acid residue located in 
position +2 is a serine or threonine, no further amino acid substitution is required. However, 
if this position is occupied by a different amino acid residue, a serine or threonine residue 
needs to be introduced. 

Preferably, the conjugate according to this embodiment is one which comprises a modified 
FSH-a having an amino acid residue which differs from that of hFSH-a in the introduction 
of at least one N-glycosylation site by a mutation selected from the group consisting of 
P2(a)N+V4(a)S, P2(a)N+V4(a)T, D3(a)N+Q5(a)S, D3(a)N+Q5(a)T, V4(a)N+D6(a)S, 
V4(a)N+D6(a)S, D6(a)N+P8(a)S, D6(a)N+P8(a)T, E9(a)N+Tll(a)S, E9(a)N, 
Tll(a)N+Q13(a)S, Tll(a)N+Q13(a)T, L12(a)N+E14(a)S, L12(a)N+E14(a)T, 
E14(a)N + P16(a)S, E14(a)N+P16(a)T, P16(a)N+F18(a)S, P16(a)N+F18(a)T, F17(a)N, 
F17(a)N+S19(a)T, G22(a)N+P24(a)S, G22(a)N+P24(a)T, P24(a)N+L26(a)S, 
P24(a)N+L26(a)T, F33(a)N+R35(a)S, F33(a)N+R35(a)T, R42(a)N+K44(a)S, 
R42(a)N+K44(a)T, S43(a)N+K45(a)S, S43(a)N+K45(a)T, K44(a)N+T46(a)S, K44(a)N, 
K45(a)N+M47(a)S, K45(a)N+M47(a)T, T46(a)N+L48(a)S, T46(a)N+L48(a)T, 
L48(a)N+Q50(a)S, 148(a)N+Q50(a)T, V49(a)N+K51(a)S, V49(a)N+K51(a)T, 
Q50(a)N+N52(a)S, Q50(a)N+N52(a)T, V61(a)N+K63(a)S, V61(a)N+K63(a)T, 
K63(a)N+Y65(a)S, K63(a)N+Y65(a)T, S64(a)N+N66(a)S, S64(a)N+N66(a)T, 
Y65(a)N+R67(a)S, Y65(a)N+R67(a)T, V68(a)S, V68(a)T, R67(a)N+T69(a)S, R67(a)N, 
T69(a)N+M71(a)S, T69(a)N+M71(a)T, M71(a)N+G73(a)S, M71(a)N+G73(a)T, 
G72(a)N+F74(a)S, G72(a)N+F74(a)T, G73(a)N+K75(a)S, G73(a)N+K75(a)T, 
F74(a)N+V76(a)S, F74(a)N+V76(a)T, K75(a)N+E77(a)S, K75(a)N+E77(a)T, 
A81(a)N+H83(a)S, A81(a)N+H83(a)T, H83(a)N, T86(a)N+Y88(a)S, T86(a)N+Y88(a)T, 
Y88(a)N+H90(a)S, Y88(a)N+H90(a)T, Y89(a)N+K91(a)S, Y89(a)N+K91(a)T, H90(a)N 
and H90(a)N+S92(a)T, more preferably from the group consisting of V68(a)S, V68(a)T, 
E9(a)N, F17(a)N, K44(a)N, R67(a)N, H83(a)N and H90(a)N, even more preferably from the 
group consisting of P2(a)N+V4(a)S, P2(a)N+V4(a)T, D3(a)N+Q5(a)S, D3(a)N+Q5(a)T, 
V4(a)N+D6(a)S, V4(a)N+D6(a)S, D6(a)N+P8(a)S, D6(a)N+P8(a)T, E9(a)N+Tll(a)S, 
E9(a)N, Tll(a)N+Q13(a)S, Tll(a)N+Q13(a)T, E14(a)N+P16(a)S, E14(a)N+P16(a)T, 
P16(a)N+F18(a)S, P16(a)N+F18(a)T, F17(a)N, F17(a)N+S19(a)T, G22(a)N+P24(a)S, 
G22(a)N+P24(a)T, K45(a)N+M47(a)S, K45(a)N+M47(a)T, T46(a)N+L48(a)S, 
T46(a)N+L48(a)T, L48(a)N+Q50(a)S, 148(a)N+Q50(a)T, V49(a)N+K51(a)S, 
V49(a)N+K51(a)T, Q50(a)N+N52(a)S, Q50(a)N+N52(a)T, K63(a)N+Y65(a)S, 
K63(a)N+Y65(a)T, S64(a)N+N66(a)S, S64(a)N+N66(a)T, V68(a)S, V68(a)T, 
R67(a)N+T69(a)S, R67(a)N, T69(a)N+M71(a)S, T69(a)N+M71(a)T, G72(a)N+F74(a)S, 
G72(a)N+F74(a)T, G73(a)N+K75(a)S, G73(a)N+K75(a)T, K75(a)N+E77(a)S, 
K75(a)N+E77(a)T, T86(a)N+Y88(a)S, T86(a)N+Y88(a)T, Y89(a)N+K91(a)S, 
Y89(a)N+K91(a)T, H90(a)N, and H90(a)N+S92(a)T, (having more than 50% side chain 
accessibility), and still more preferably from the group consiting of E9(a)N, F17(a)N, 
R67(a)N, and H90(a)N. The FSH-0 part of this conjugate may be hFSH-p or any of the 
modified FSH-P polypeptides described herein. 

Alternatively or additionally, the conjugate according to this embodiment comprises a modi- 
fied FSH-P having an amino acid residue which differs from that of hFSH-p in the introduc- 
tion of at least one N-glycosylation site by a mutation selected from the group consisting of 
S2(b)N+E4(b)S, S2(b)N+E4(b)T, E4(b)N+T6(b)S, E4(b)N, L5(b)N+N7(b)S, 
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L5(b)N+L7(b)T, T6(b)N+I8(b)S, T6(b)N+I8(b)T, I8(b)N+I10(b)S, I8(b)N+I10(b)T 
T9(b)N+All(b)S, T9(b)N+All(b)T, K14(b)N+E16(b)S, K14(b)N+E16(b)T, 
F19(b)N+I21(b)S, F19(b)N+I21(b)T, I21(b)N+I23(b)S, I21(b)N+I23(b)T, 
S22(b)N+N24(b)S, S22(b)N+N24(b)T, Y31(b)N+Y33(b)S, Y31(b)N+Y33(b)T 
Y33(b)N+R35(b)S, Y33(b)N+R35(b)T, R35(b)N+L37(b)S, R35(b)N+L37(b)T, 
D36(b)N+V38(b)S, D36(b)N+V38(b)T, L37(b)N+Y39(b)S, L37(b)N+Y39(b)T, 
K40(b)N+P42(b)S, K40(b)N+P42(b)T, A43(b)N+P45(b)S, A43(b)N+P45(b)T, 
P45(b)N+I47(b)S, P45(b)N+I47(b)T, K46(b)N+Q48(b)S, K46(b)N+Q48(b)T 
I47(b)N+K49(b)S, I47(b)N+K49(b)T, K54(b)N+L56(b)S, K54(b)N+L56(b)T, 
E55(b)N+V57(b)S, E55(b)N+V57(b)T, L56(b)N+Y58(b)S, L56(b)N+Y58(b)T 
V57(b)N+E59(b)S, V57(b)N+E59(b)T, Y58(b)N+T60(b)S, Y58(b)N, E59(b)N+V61(b)S 
E59(b)N + V61(b)T, T60(b)N +R62(b)S, T60(b)N +R62(b)T, R62(b)N +P64(b)S 
R62(b)N+P64(b)T, G65(b)N+A67(b)S, G65(b)N+A67(b)T, A67(b)N+H69(b)S 
A67(b)N+H69(b)T, H68(b)N+A70(b)S, H68(b)N+A70(b)T, H69(b)N+D71(b)S, 
H69(b)N+D71(b)T, D71(b)N+L73(b)S, D71(b)N+L73(b)T, L73(b)N+T75(b)S, L73(b)N 
T75(b)N+P77(b)S, T75(b)N+P77(b)T, H83(b)N+G85(b)S, H83(b)N+G85(b)T, 
K86(b)N+D88(b)S, K86(b)N+D88(b)T, D88(b)N+D90(b)S, D88(b)N+D90(b)T, S89(b)N 
S89(b)N+S91(b)T, D90(b)N+T92(b)S, D90(b)N, S91(b)N+D93(b)S, S91(b)N+D93(b)T 
D93(b)N+T96(b)S, D93(b)N, T95(b)N+R97(b)S, T95(b)N+R97(b)T, V96(b)N+G98(b)S 
V96(b)N+G98(b)T, R97(b)N+L99(b)S, R97(b)N+L99(b)T, L99(b)N+P101(b)S, 
L99(b)N+P101(b)T, Y103(b)N, Y103(b)N+S105(b)T, S105(b)N+G107(b)S, 
S105(b)N+G107(b)T, F106(b)N+E108(b)S, F106(b)N+E108(b)T, G107(b)N+M109(b)S 
G107(b)N+M109(b)T, E108(b)N+K110(b)S, E108(b)N+Kl 10(b)T, M109(b)N+Elll(b)S, 
and M109(b)N+Elll(b)T, more preferably from the group consisting of E4(b)N, Y58(b)N 
L73(b)N, S89(b)N, D90(b)N, D93(b)N, and Y103(b)N, even more preferably from the group 
consisting of F19(b)N+I21(b)S, F19(b)N+I21(b)T, Y33(b)N+R35(b)S, Y33(b)N+R35(b)T, 
A43(b)N+P45(b)S, A43(b)N+P45(b)T, P45(b)N+I47(b)S, P45(b)N+I47(b)T, 
K46(b)N+Q48(b)S, K46(b)N+Q48(b)T, I47(b)N+K49(b)S, I47(b)N+K49(b)T, 
K54(b)N+L56(b)S, K54(b)N+L56(b)T, E55(b)N+V57(b)S, E55(b)N+V57(b)T, 
V57(b)N+E59(b)S, V57(b)N+E59(b)T, Y58(b)N+T60(b)S, Y58(b)N, E59(b)N+V61(b)S 
E59(b)N+V61(b)T, R62(b)N+P64(b)S, R62(b)N+P64(b)T, G65(b)N+A67(b)S, 
G65(b)N+A67(b)T, A67(b)N+H69(b)S, A67(b)N+H69(b)T, H68(b)N+A70(b)S, 
H68(b)N+A70(b)T, H69(b)N+D71(b)S, H69(b)N+D71(b)T, D71(b)N+L73(b)S 
D71(b)N+L73(b)T, L73(b)N+T75(b)S, L73(b)N, T75(b)N+P77(b)S, T75(b)N+P77(b)T 
H83(b)N + G85(b)S, H83(b)N+G85(b)T, K86(b)N + D88(b)S, K86(b)N+D88(b)T, 
D88(b)N + D90(b)S, D88(b)N+D90(b)T, S89(b)N, S89(b)N+S91(b)T, D90(b)N+T92(b)S 
D90(b)N, S91(b)N+D93(b)S, S91(b)N+D93(b)T, T95(b)N+R97(b)S, T95(b)N+R97(b)T, 
R97(b)N+L99(b)S, R97(b)N+L99(b)T, L99(b)N+P101(b)S, L99(b)N+P101(b)T, 
Y103(b)N, Y103(b)N+S105(b)T, S105(b)N+G107(b)S, S105(b)N+G107(b)T, 
F106(b)N+E108(b)S, F106(b)N+E108(b)T, G107(b)N+M109(b)S, G107(b)N+M109(b)T 
E108(b)N+K110(b)S, E108(b)N+K110(b)T, M109<b)N+Elll(b)S, and 
M109(b)N+Elll(b)T (having more than 50% side chain accessibility), and even more pref- 
erably from the group consisting of Y58(b)N, L73(b)N, S89(b)N, D90(b)N, and Y103(b)N. 
The FSH-a part of this conjugate may be hFSH-a or any of the modified FSH-a polypep- 
tides described herein. 

The FSH-a and/or FSH-p polypeptide may further differ from hFSH-a and/or hFSH-p in at 
least one removed, naturally occurring N-glycosylation site. In particular FSH-a may com- 
prise a substitution of N78(a) and/or T80(a) by any other amino acid residue and/or FSH-P a 
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substitution of N7(b), T9(b), N24(b) and/or T26(b) by any other amino acid residue. Prefera- 
bly, the N residue is substituted by Q or D, and the T residue by A or G. 

Furthermore, FSH-a of the conjugate according to this embodiment (having at least one of 
5 the above mentioned N-glycosylation site modifications) may differ from hFSH-a in the re- 
moval of at least one lysine residue selected from the group consisting of K44(a), K45(a), 
K51(a), K63(a), K75(a), and K91(a), in particular at least one amino acid residue selected 
from of the group consisting of K44(a), K45(a), K63(a), K75(a), and K91(a) (these residues 
having more than 25% of their side chain exposed to the surface), and preferably from the 
10 group consisting of K45(a), K63(a), K75(a), and K91(a) (these residues having more than 
50% of their side chain exposed to the surface). 

An alternative embodiment of this aspect of the invention is one in which at least one of said 
FSH-a and FSH-p subunits comprises at least one introduced N- or O-glycosylation site at 
the N-terminal thereof, and wherein the at least one introduced glycosylation site is glycosy- 
lated. In this case, the respective subunits may comprise one or more of the modifications 
disclosed elsewhere herein, or one or both of the subunits may be the respective wildtype 
subunits, but having the at least one introduced terminal glycosylation site. Thus, the poly- 
peptide conjugate may be one in which the FSH-a subunit comprises hFSH-a having the se- 
quence shown in SEQ ID NO 2, and/or in which the FSH-P subunit comprises hFSH-P hav- 
ing the sequence shown in SEQ ID NO 4. In a particular embodiment, both of the subunits 
correspond to the respective wildtype hFSH subunits, although with either the a or P subunit, 
or both, having an introduced N-terminal glycosylation site. 

25 The introduced glycosylation site may be of the type described elsewhere herein; see the dis- 
cussion of glycosylation under the general discussion of attachment groups above. A non- 
limiting example of a suitable glycosylation site for introduction at the N-terminal is the se- 
quence Ala-Asn-IIe-Thr-Val-Asn-Ile-Thr-Val, e.g. for insertion upstream of a mature FSH-a 
sequence. 

30 

It will be understood that in order to prepare a conjugate according to this aspect the polypep- 
tide i) must be expressed in a glycosylating host cell capable of attaching oligosaccharide 
moieties at the glycosylation site(s) or alternatively subjected to in vitro glycosylation. Exam- 
ples of glycosylating host cells are given in the section further below entitled "Coupling to an 
35 oligosaccharide moiety". 

In addition to a carbohydrate molecule, the conjugate according to the aspect of the invention 
described in the present section may contain additional non-polypeptide moieties different 
from O-linked or N-linked carbohydrate moieties, in particular a polymer molecule as de- 
40 scribed herein conjugated to one or more attachment groups present in the polypeptide part of 
the conjugate. This is particularly relevant when a lysine residue (or any other amino acid 
residue comprising an attachment group for the non-polypeptide moiety in question) has been 
introduced and/or removed. 

45 It will be understood that any of the amino acid changes, in particular substitutions, specified 
in this section can be combined with any of the amino acid changes, in particular substitu- 
tions, specified in the other sections herein disclosing specific amino acid changes. 
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Non-polypeptide moiety of the conjugate of the invention 

As indicated further above the non-polypeptide moiety of the conjugate of the invention is 
preferably selected from the group consisting of a polymer molecule, a lipophilic compound, 
an oligosaccharide moiety (by way of in vivo glycosylation) and an organic derivatizing 
agent. AH of these agents may confer desirable properties to the polypeptide part of the con- 
jugate, in particular an increased functional in vivo half-life and/or an increased serum half- 
life. The polypeptide part of the conjugate is normally conjugated to only one type of non- 
polypeptide moiety, but may also be conjugated to two or more different types of non- 
polypeptide moieties, e.g. to a polymer molecule and an oligosaccharide moiety, to a lipo- 
philic group and an oligosaccharide moiety, to an organic derivatizing agent and an oligosac- 
charide moiety, to a lipophilic group and a polymer molecule, etc. The conjugation to two or 
more different non-polypeptide moieties may be done simultaneous or sequentially. 

Polypeptide of the invention 

In a further aspect the invention relates to a modified FSH-a or a modified FSH-(J polypep- 
tide constituting part of a conjugate of the invention. The modified FSH-a and FSH-0 is 
preferably glycosylated and thus further comprises N-Iinked and/or O-linked oligosaccharide 
moieties. Specific modified FSH-a and FSH-P polypeptides of the invention are those de- 
scribed in the section entitled "Conjugate of the invention". 

Methods of preparing a conjugate of the invention 

In the following sections "Conjugation to a lipophilic compound", "Conjugation to a polymer 
molecule", "Conjugation to an oligosaccharide moiety" and "Conjugation to an organic deri- 
vatizing agent", conjugation to specific types of non-polypeptide moieties is described. 

Conjugation to a lipophilic compound 

The polypeptide and the lipophilic compound may be conjugated to each other, either directly 
or by use of a linker. The lipophilic compound may be a natural compound such as a satu- 
rated or unsaturated fatty acid, a fatty acid diketone, a terpene, a prostaglandin, a vitamin, a 
carotenoid or steroid, or a synthetic compound such as a carbon acid, an alcohol, an amine 
and sulphonic acid with one or more alkyl, aryl, alkenyl or other multiple unsaturated com- 
pounds. The conjugation between the polypeptide and the lipophilic compound, optionally 
through a linker, may be done according to methods known in the art, e.g. as described by 
Bodanszky in Peptide Synthesis, John Wiley, New York, 1976 and in WO 96/12505. 

Conjugation to a polymer molecule 

The polymer molecule to be coupled to the polypeptide may be any suitable polymer mole- 
cule, such as a natural or synthetic homo-polymer or hetero-polymer, typically with a mo- 
lecular weight in the range of 300-50,000 Da, such as 300-20,000 Da, more preferably in the 
range of 500-10,000 Da, even more preferably in the range of 500-5000 Da. Examples of 
homo-polymers include a polyol (i.e. poly-OH), a polyamine (i.e. poly-NH 2 ) and a polycar- 
boxylic acid (i.e. poly-COOH). A hetero-polymer is a polymer which comprises different 
coupling groups, such as a hydroxyl group and an amine group. 

Examples of suitable polymer molecules include polymer molecules selected from the group 
consisting of polyalkylene oxide (PAO), including polyalkylene glycol (PAG), such as poly- 
ethylene glycol (PEG) and polypropylene glycol (PPG), branched PEGs, poly-vinyl alcohol 
(PVA), poly-carboxylate, poly-(viny!pyrolidone), polyethylene-co-maleic acid anhydride, 
polystyrene-co-maleic acid anhydride, dextran, including carboxymethyl-dextran, or any 
other biopolymer suitable for reducing immunogenic ity and/or increasing functional in vivo 
half-life and/or serum half-life. Another example of a polymer molecule is human albumin or 



19 



anothei abundant plasma protein. Generally, polyalkylene glycol-derived polymers are bio- 
compatible, non-toxic, non-antigenic, non-immunogenic, have various water solubility prop- 
erties, and are easily excreted from living organisms. 

PEG is the preferred polymer molecule, since it has only few reactive groups capable of 
cross-linking compared, e.g., to polysaccharides such as dextran, and the like. In particular, 
monofunctional PEG, e.g. methoxypolyethylene glycol (mPEG), is of interest since its cou- 
pling chemistry is relatively simple (only one reactive group is available for conjugating with 
attachment groups on the polypeptide). Consequently, the risk of cross-linking is eliminated, 
the resulting polypeptide conjugates are more homogeneous and the reaction of the polymer 
molecules with the polypeptide is easier to control. 

To effect covalent attachment of the polymer molecule(s) to the polypeptide, the hydroxy 1 
end groups of the polymer molecule must be provided in activated form, i.e. with reactive 
functional groups. Suitable activated polymer molecules are commercially available, e.g. 
from Shearwater Polymers, Inc., Huntsville, AL, USA. Alternatively, the polymer molecules 
can be activated by conventional methods known in the art, e.g. as disclosed in WO 
90/13540. Specific examples of activated linear or branched polymer molecules for use in the 
present invention are described in the Shearwater Polymers, Inc. 1997 and 2000 Catalogs 
(Functionalized Biocompatible Polymers for Research and pharmaceuticals, Polyethylene 
Glycol and Derivatives, incorporated herein by reference). Specific examples of activated 
PEG polymers include the following linear PEGs: NHS-PEG (e.g. SPA-PEG, SSPA-PEG, 
SBA-PEG SS-PEG, SSA-PEG, SC-PEG, SG-PEG, and SCM-PEG), and NOR-PEG), BTC- 
PEG, EPOX-PEG, NCO-PEG, NPC-PEG, CDI-PEG, ALD-PEG, TRES-PEG, VS-PEG, 
IODO-PEG, and MAL-PEG, and branched PEGs such as PEG2-NHS and those disclosed in 
US 5,932,462 and US 5,643,575, both of which are incorporated herein by reference. Fur- 
thermore,' the following publications, incorporated herein by reference, disclose useful poly- 
mer molecules and/or PEGylation chemistries: US 5,824,778, US 5,476,653, WO 97/32607, 
EP 229,108, EP 402,378, US 4,902,502, US 5,281,698, US 5,122,614, US 5,219,564, WO 
92/16555 WO 94/04193, WO 94/14758, WO 94/17039, WO 94/18247, WO 94/28024, WO 
95/00162' WO 95/11924, WO95/13090, WO 95/33490, WO 96/00080, WO 97/18832, WO 
98/41562' WO 98/48837, WO 99/32134, WO 99/32139, WO 99/32140, WO 96/40791, WO 
98/32466 WO 95/06058, EP 439 508, WO 97/03106, WO 96/21469, WO 95/13312, EP 
921 131 US 5,736,625, WO 98/05363, EP 809 996, US 5,629,384, WO 96/41813, WO 
96/07670, US 5,473,034, US 5,516,673, EP 605 963, US 5,382,657, EP 510 356, EP 400 
472, EP 183 503 and EP 154 316. 

The conjugation of the polypeptide and the activated polymer molecules is conducted by use 
of any conventional method, e.g. as described in the following references (which also de- 
scribe suitable methods for activation of polymer molecules): R.F. Taylor, (1991), "Protein 
immobilisation. Fundamental and applications", Marcel Dekker, N.Y.; S.S. Wong, (1992), 
"Chemistry of Protein Conjugation and Crosslinking", CRC Press, Boca Raton; G.T. Her- 
manson et al., (1993), "Immobilized Affinity Ligand Techniques", Academic Press, N.Y.). 
The skilled person will be aware that the activation method and/or conjugation chemistry to 
be used depends on the attachment group(s) of the polypeptide (examples of which are given 
further above), as well as the functional groups of the polymer (e.g. being amine, hydroxyl, 
carboxyl, aldehyde, sulfydryl, succinimidyl, maleimide, vinysulfone or haloacetate). The 
PEGylation may be directed towards conjugation to all available attachment groups on the 
polypeptide (i.e. such attachment groups that are exposed at the surface of the polypeptide) or 
may be directed towards one or more specific attachment groups, e.g. the N-terminal amino 
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group (US 5,985,265). Furthermore, the conjugation may be achieved in one step or in a 
stepwise manner (e.g. as described in WO 99/55377). 

It will be understood that the PEGylation is designed so as to produce the optimal molecule 
with respect to the number of PEG molecules attached, the size and form of such molecules 
(e.g. whether they are linear or branched), and the attachment site(s) in the polypeptide. The 
molecular weight of the polymer to be used may e.g. be chosen on the basis of the desired 
effect to be achieved. For instance, in order to obtain reduced renal clearance (and thus in- 
creased half-life of the conjugate), the molecular weight of the conjugate is important. Ac- 
cordingly, for this purpose the PEGylation is designed so as to achieve a sufficiently high 
molecular weight of the conjugate, e.g. a molecular weight of at least about 67 kDa in many 
cases. As indicated above, in other cases is may however be desirable to have a molecular 
weight that is somewhat increased, but which still is below about 67 kDa. In such cases, PE- 
Gylation may be performed so as to produce conjugates having one or more relatively small 
PEG polymers, for example one, two or three PEG polymers each having a molecular weight 
of e.g. up to about 5000 Da. 

In connection with conjugation to only a single attachment group on the protein (as described 
in US 5,985,265), it may be advantageous that the polymer molecule, which may be linear or 
branched, has a high molecular weight, e.g. about 20 kDa. 

In a specific embodiment, the polypeptide conjugate of the invention is one which comprises 
a single PEG molecule attached to the N-terminal of the polypeptide and no other PEG mole- 
cules, in particular a linear or branched PEG molecule with a molecular weight of at least 
about 20 kDa. The polypeptide according to this embodiment may further comprise one or 
more oligosaccharide moieties attached to an N-linked or O-linked glycosylation site of the 
polypeptide or carbohydrate moieties attached by in vitro glycosylation. 

In another specific embodiment, the polypeptide conjugate of the invention comprises a PEG 
molecule attached to each of the lysine residues in the polypeptide available for PEGylation, 
in particular a linear or branched PEG molecule, e.g. with a molecular weight of about 5 
kDa. 

In yet another embodiment, the polypeptide conjugate of the invention comprises a PEG 
molecule attached to each of the lysine residues in the polypeptide available for PEGylation, 
and in addition to the N-terminal amino acid residue of the polypeptide. 

Normally, the polymer conjugation is performed under conditions aiming at reacting all avail- 
able polymer attachment groups with polymer molecules. Typically, the molar ratio of activated 
polymer molecules to polypeptide is up to 500-1, such as 200-1, preferably 100-1, such as 50-1 
or 25-1 in order to obtain optimal reaction. Furthermore, the polymer modification, such as 
PEGylation, is conveniently carried out at at a pH in the range of 7-10, such as in the range of 
8-10, in particular in the range of 8-9. 

It is also contemplated according to the invention to couple the polymer molecules to the poly- 
peptide through a linker. Suitable linkers are well known to the skilled person. A preferred ex- 
ample is cyanuric chloride (Abuchowski et al, (1977), J. Biol. Chem., 252, 3578-3581; US 
4,179,337; Shafer etal., (1986), J. Polym. Sci. Polym. Chem. Ed., 24, 375-378. 
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Subsequent to the conjugation residual activated polymer molecules are blocked according to 
methods known in the art, e.g. by addition of primary amine to the reaction mixture, and the 
resulting inactivated polymer molecules removed by a suitable method. 

5 Covalent in vitro coupling of carbohydrate moieties glycosides (such as dextran) to amino 
acid residues of the polypeptide may also be used, e.g. as described in WO 87/05330 and in 
Aplin et al., CRC Crit Rev. Biochem., pp. 259-306, 1981. The in vitro coupling of carbohy- 
drate moieties or PEG to protein- and peptide-bound Gin-residues can be carried out by 
transglutaminases (TGases). Transglutaminases catalyse the transfer of donor amine-groups to 

10 protein- and peptide-bound Gin residues in a so-called cross-linking reaction. The donor- 
amine groups can be protein- or peptide-bound e.g. as the e-amino-group in Lys residues or 
can be part of a small or large organic molecule. An example of a small organic molecule 
functioning as amino-donor in TGase-catalysed cross-linking is putrescine (1 ,4- 
diaminobutane). An example of a larger organic molecule functioning as amino-donor in 

15 TGase-catalysed cross-linking is an amine-containing PEG (Sato et al., Biochemistry 35, 
13072-13080). 

TGases, in general, are highly specific enzymes, and not every Gin residue exposed on the 
surface of a protein is accessible to TGase-catalysed cross-linking to amino-containing sub- 

20 stances. On the contrary, only a few Gin residues function naturally as TGase substrates but 
the exact parameters governing which Gin residues are good TGase substrates remain un- 
known. Thus, in order to render a protein susceptible to TGase-catalysed cross-linking reac- 
tions it is often a prerequisite at convenient positions to add stretches of amino acid sequence 
known to function very well as TGase substrates. Several amino acid sequences are known to 

25 be or to contain excellent natural TGase substrates e.g. substance P, elafin, fibrinogen, fi- 
bronectin, Oj-plasmin inhibitor, a-caseins, and P-caseins. 

Coupling to an oligosaccharide moiety 

The conjugation to an oligosaccharide moiety takes place by in vivo glycosylation effected by 
30 a glycosylating, eucaryotic expression host. The expression host cell may be selected from 
fungal (filamentous fungal or yeast), insect or animal cells or from transgenic plant cells. In 
one embodiment the host cell is a mammalian cell, such as a CHO cell, BHK or HEK, e.g. 
HEK 293, cell, or an insect cell, such as an SF9 cell, or a yeast cell, e.g. S. cerevisiae or 
Pichia pastoris, or any of the host cells mentioned hereinafter. 

35 

Coupling to an organic derivatizing agent 

Covalent modification of the polypeptide exhibiting FSH activity may be performed by react- 
ing one or more (attachment groups of the polypeptide with an organic derivatizing agent. 
Suitable derivatizing agents and methods are well known in the art. For example, cysteinyl 

40 residues most commonly are reacted with ct-haloacetates (and corresponding amines), such as 
chloroacetic acid or chloroacetamide, to give carboxymethyl or carboxyamidomethyl deriva- 
tives. Cysteinyl residues also are derivatized by reaction with bromotrifluoroacetone, a- 
bromo-p-(4-imidozoyl)propionic acid, chloroacetyl phosphate, N-alkylmaleimides, 3-nitro-2- 
pyridyl disulfide, methyl 2-pyridyl disulfide, p-chloromercuribenzoate, 2-chloromercuri-4- 

45 nitrophenol, or chloro-7-nitrobenzo-2-oxa-l,3-diazole. Histidyl residues are derivatized by 
reaction with diethylpyrocarbonateat, pH 5.5-7.0, because this agent is relatively specific for 
the histidyl side chain. Para-bromophenacyl bromide is also useful. The reaction is preferably 
performed in 0.1 M sodium cacodylate at pH 6.0. Lysinyl and amino terminal residues are 
reacted with succinic or other carboxylic acid anhydrides. Derivatization with these agents 

so has the effect of reversing the charge of the lysinyl residues. Other suitable reagents for deri- 
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vatizing ot-amino-containing residues include imidoesters such as methyl picolinimidate, pyri- 
doxal phosphate, pyridoxal, chloroborohydride, trinitrobenzenesulfonic acid, O- 
methylisourea, 2,4-pentanedione and transaminase-catalyzed reaction with glyoxylate. Ar- 
ginyl residues are modified by reaction with one or several conventional reagents, among 
them phenylglyoxal, 2,3-butanedione, 1,2-cyclohexanedione, and ninhydrin. Derivatization 
of arginine residues requires that the reaction be performed under alkaline conditions because 
of the high pKa of the guanidine functional group. 

Furthermore, these reagents may react with the groups of lysine as well as the arginine gua- 
nidino group. Carboxyl side groups (aspartyl or glutamyl) are selectively modified by reac- 
tion with carbodiimides (R-N=C=N-R'), where R and R' are different alkyl groups, such as 
l-cyclohexyl-3-(2-morpholinyl-4-ethyl) carbodiimide or l-ethyl-3-(4-azonia-4,4- 
dimethylpentyl) carbodiimide. Furthermore, aspartyl and glutamyl residues are converted to 
asparaginyl and glutaminyl residues by reaction with ammonium ions. 

Blocking of a junctional site 

It has been reported that excessive polymer conjugation can lead to a loss of activity of the 
polypeptide to which the polymer is conjugated. This problem can be eliminated, e.g., by 
removal of attachment groups located at the functional site or by blocking the functional site 
prior to conjugation. The latter strategy constitutes a further embodiment of the invention (the 
first strategy being exemplified further above, e.g. by removal of lysine residues which may 
be located close to the functional site). More specifically, according to the second strategy the 
conjugation between the polypeptide and the non-polypeptide moiety ii) is conducted under 
conditions where the functional site of the polypeptide i) is blocked by a helper molecule ca- 
pable of binding to the functional site of the polypeptide i). 

Preferably, the helper molecule is one which specifically recognizes a functional site of the 
polypeptide, such as a receptor, in particular the FSH receptor or a part of the FSH receptor. 
Alternatively, the helper molecule may be an antibody, in particular a monoclonal antibody 
recognizing the polypeptide exhibiting FSH activity. In particular, the helper molecule may 
be a neutralizing monoclonal antibody. 

The polypeptide is allowed to interact with the helper molecule before effecting conjugation. 
This ensures that the functional site of the polypeptide is shielded or protected and conse- 
quently unavailable for derivatization by the non-polypeptide moiety such as a polymer. Fol- 
lowing its elution from the helper molecule, the conjugate between the non-polypeptide moi- 
ety and the polypeptide can be recovered with at least a partially preserved functional site. 

The subsequent conjugation of the polypeptide having a blocked functional site to a polymer, 
a lipophilic compound, an oligosaccharide moiety, an organic derivatizing agent or any other 
compound is conducted in the normal way, e.g. as described in the sections above entitled 
"Conjugation to 

Irrespective of the nature of the helper molecule to be used to shield the functional site of the 
polypeptide from conjugation, it is desirable that the helper molecule is free of or comprises 
only a few attachment groups for the non-polypeptide moiety of choice in any parts of the 
molecule where the conjugation to such groups will hamper the desorption of the conjugated 
polypeptide from the helper molecule. Hereby, selective conjugation to attachment groups 
present in non-shielded parts of the polypeptide can be obtained and it is possible to reuse the 
helper molecule for repeated cycles of conjugation. For instance, if the non-polypeptide moi- 
ety is a polymer molecule such as PEG which has the epsilon amino group of a lysine or N- 
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terminal amino acid residue as an attachment group, it is desirable that the helper molecule is 
substantially free of conjugatable epsilon amino groups, preferably free of any epsilon amino 
groups. Accordingly, in a preferred embodiment the helper molecule is a protein or peptide 
capable of binding to the functional site of the polypeptide, which protein or peptide is free of 
any conjugatable attachment groups for the non-polypeptide moiety of choice. 

In a further embodiment the helper molecule is first covalently linked to a solid phase such as 
column packing materials, for instance Sephadex or agarose beads, or a surface, e.g. a reac- 
tion vessel. Subsequently, the polypeptide is loaded onto the column material carrying the 
helper molecule and conjugation carried out according to methods known in the art, e.g. as 
described in the sections above entitled "Conjugation to This procedure allows the 
polypeptide conjugate to be separated from the helper molecule by elution. The polypeptide 
conjugate is eluated by conventional techniques under physico-chemical conditions that do not 
lead to a substantive degradation of the polypeptide conjugate. The fluid phase containing the 
polypeptide conjugate is separated from the solid phase to which the helper molecule remains 
covalently linked. The separation can be achieved in other ways: For instance, the helper 
molecule may be derivatised with a second molecule (e.g. biotin) that can be recognized by a 
specific binder (e.g. streptavidin). The specific binder may be linked to a solid phase thereby 
allowing the separation of the polypeptide conjugate from the helper molecule-second mole- 
cule complex through passage over a second helper-solid phase column which will retain, 
upon subsequent elution, the helper molecule-second molecule complex, but not the polypep- 
tide conjugate. The polypeptide conjugate may be released from the helper molecule in any 
appropriate fashion. Deprotection may be achieved by providing conditions in which the 
helper molecule dissociates from the functional site of the FSH to which it is bound. For in- 
stance, a complex between an antibody to which a polymer is conjugated and an anti-idiotypic 
antibody can be dissociated by adjusting the pH to an acid or alkaline pH. 

Conjugation of a tagged polypeptide 

In an alternative embodiment the polypeptide i) is expressed as a fusion protein with a tag, 
i.e. an amino acid sequence or peptide stretch made up of typically 1-30, such as 1-20 amino 
acid residues. Besides allowing for fast and easy purification, the tag is a convenient tool for 
achieving conjugation between the tagged polypeptide i) and the non-polypeptide moiety ii). 
In particular, the tag may be used for achieving conjugation in microtiter plates or other car- 
riers, such as paramagnetic beads, to which the tagged polypeptide can be immobilised via 
the tag. The conjugation to the tagged polypeptide i) in, e.g., microtiter plates has the advan- 
tage that the tagged polypeptide can be immobilised in the microtiter plates directly from the 
culture broth (in principle without any purification) and subjected to conjugation. Thereby, 
the total number of process steps (from expression to conjugation) can be reduced. Further- 
more, the tag may function as a spacer molecule ensuring an improved accessibility to the 
immobilised polypeptide to be conjugated. The conjugation using a tagged polypeptide i) may 
be to any of the non-polypeptide moieties disclosed herein, e.g. to a polymer molecule such 
as PEG. 

The identity of the specific tag to be used is not critical as long as the tag is capable of being 

expressed with the polypeptide i) and is capable of being immobilised on a suitable surface or 

carrier material. A number of suitable tags are commercially available, e.g. from Unizyme 

Laboratories, Denmark. For instance, the tag may consist of any of the following sequences: 

His-His-His-His-His-His 

Met-Lys-His-His-His-His-His-His 

Met-Lys-His-His-Ala-His-His-Gln-His-His 

Met-Lys-His-Gln-His-Gln-His-GIn-His-Gln-His-Gln-His-Gln 
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Met-Lys-His-Gln-His-Gln-His-GIn-His-Gln-His-GIn-His-Gln-Gln 
or any of the following: . 

EQKLI SEEDL (a C-terminal tag described in Mol. Cell. Biol. 5:3610-16, 1985) 
5 DYKDDDDK (a C- or N-terminal tag) 
YPYDVPDYA 

Antibodies against the above tags are commercially available, e.g. from ADI, Aves Lab and 
Research Diagnostics. 

10 

The subsequent cleavage of the tag from the polypeptide i) may be achieved by use of com- 
mercially available enzymes. 

Methods for preparing a polypeptide of the invention or the polype ptide i) of the conjugate of 
15 the invention 

The polypeptide of the present invention or the polypeptide part of a conjugate of the inven- 
tion, optionally in glycosylated form, may be produced by any suitable method known in the 
an. Such methods include constructing a nucleotide sequence encoding the polypeptide and 
expressing the sequence in a suitable transformed or transfected host. Polypeptides of the 
20 invention may also be produced, albeit less efficiently, by chemical synthesis or a combina- 
tion of chemical synthesis and recombinant DNA technology. 

FSH-a and FSH-p may be expressed separately and subquently allowed to dimerize. How- 
ever, it is preferred that FSH-a and FSH-P are expressed by the same host cell and dimerized 

25 in vivo prior to purification and any conjugation to a non-polypeptide moiety. Co-expression 
of FSH-a and FSH-p* in CHO cells is described by Keene et al., J Biol Chem 1989 25; 
264(9): 4769-75. Alternatively, the polypeptide i) may be expressed as a single-chain poly- 
peptide wherein the nucleotide sequences encoding FSH-a and FSH-P are fused, directly or 
using a suitable linker, and expressed as a single-chain polypeptide using a similar approach 

30 to that described in. US 5,883,073. 

The nucleotide sequence encoding FSH-a or FSH-P modified according to the invention may 
be constructed by isolating or synthesizing a nucleotide sequence encoding the parent FSH 
subunit, such as hFSH-a or hFSH-p with the amino acid sequence shown in SEQ ID NO 2 or 

35 4, respectively, or the precursor form thereof (shown in SEQ ID NO 1 and 3, respectively) 
and then changing the nucleotide sequence so as to effect introduction (i.e. insertion or substi- 
tution) or deletion (i.e. removal or substitution) of the relevant amino acid residue(s). The 
nucleotide sequence is conveniently modified by site-directed mutagenesis in accordance with 
conventional methods. Alternatively, the nucleotide sequence may be prepared by chemical 

40 synthesis, e.g. by using an oligonucleotide synthesizer, wherein oligonucleotides are designed 
based on the amino acid sequence of the desired polypeptide, and preferably selecting those 
codons that are favored in the host cell in which the recombinant polypeptide will be pro- 
duced. For example, several small oligonucleotides coding for portions of the desired poly- 
peptide may be synthesized and assembled by PCR, ligation or ligation chain reaction (LCR) 

45 (Barany, PNAS 88:189-193, 1991). The individual oligonucleotides typically contain 5' or 3' 
overhangs for complementary assembly. 

Once assembled (by synthesis, site-directed mutagenesis or another method), the nucleotide 
sequence encoding the polypeptide is inserted into a recombinant vector and operably linked 
50 to control sequences necessary for expression of the FSH in the desired transformed host cell. 
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It should of course be understood that not all vectors and expression control sequences func- 
tion equally well to express the nucleotide sequence encoding a polypeptide described herein. 
Neither will all hosts function equally well with the same expression system. However, one 
5 of skill in the art may make a selection among these vectors, expression control sequences 
and hosts without undue experimentation. For example, in selecting a vector, the host must 
be considered because the vector must replicate in it or be able to integrate into the chromo- 
some. The vector's copy number, the ability to control that copy number, and the expression 
of any other proteins encoded by the vector, such as antibiotic markers, should also be con- 

10 sidered. In selecting an expression control sequence, a variety of factors should also be con- 
sidered. These include, for example, the relative strength of the sequence, its controllability, 
and its. compatibility with the nucleotide sequence encoding the polypeptide, particularly as 
regards potential secondary structures. Hosts should be selected by consideration of their 
compatibility with the chosen vector, the toxicity of the product coded for by the nucleotide 

15 sequence, their secretion characteristics, their ability to fold the polypeptide correctly, their 
fermentation or culture requirements, and the ease of purification of the products coded for 
by the nucleotide sequence. 

The recombinant vector may be an autonomously replicating vector, i.e. a vector, which ex- 
20 ists as an extrachromosomal entity, the replication of which is independent of chromosomal 
replication, e.g. a plasmid. Alternatively, the vector is one which, when introduced into a 
host cell, is integrated into the host cell genome and replicated together with the chromo- 
some^) into which it has been integrated. 

25 The vector is preferably an expression vector in which the nucleotide sequence encoding the 
polypeptide of the invention is operably linked to additional segments required for transcrip- 
tion of the nucleotide sequence. The vector is typically derived from plasmid or viral DNA. 
A number of suitable expression vectors for expression in the host cells mentioned herein are 
commercially available or described in the literature. Useful expression vectors for eu- 

30 karyotic hosts include, for example, vectors comprising expression control sequences from 
SV40, bovine papilloma virus, adenovirus and cytomegalovirus. Specific vectors are, e.g., 
pCDNA3.1(+)\Hyg (Invitrogen, Carlsbad, CA, USA) and pCI-neo (Stratagene, La Jolla, 
CA, USA). Useful expression vectors for yeast cells include the 2\x plasmid and derivatives 
thereof, the POT1 vector (US 4,931,373), the pJS037 vector described in Okkels, Ann. New 

35 York Acad. Sci. 782, 202-207, 1996, and pPICZ A, B or C (Invitrogen). Useful vectors for 
insect cells include pVL941, pBG311 (Cate et al., "Isolation of the Bovine and Human Genes 
for Mullerian Inhibiting Substance And Expression of the Human Gene In Animal Cells", 
Cell, 45, pp. 685-98 (1986), pBluebac 4.5 and pMelbac (both available from Invitrogen). 
Useful expression vectors for bacterial hosts include known bacterial plasmids, such as plas- 

40 mids from E. coli, including pBR322, pET3a and pET12a (both from Novagen Inc., WI, 
USA), wider host range plasmids, such as RP4, phage DNAs, e.g., the numerous derivatives 
of phage lambda, e;g. , NM989, and other DNA phages, such as M13 and filamentous single 
stranded DNA phages. 

45 Other vectors for use in this invention include those that allow the nucleotide sequence 

encoding the polypeptide to be amplified in copy number. Such amplifiable vectors are well 
known in the art. They include, for example, vectors able to be amplified by DHFR 
amplification (see, e.g., Kaufman, U.S. Pat. No. 4,470,461, Kaufman and Sharp, 
"Construction Of A Modular Dihydrafolate Reductase cDNA Gene: Analysis Of Signals 

so Utilized For Efficient Expression", Mol. Cell. Biol., 2, pp. 1304-19 (1982)) and glutamine 
synthetase ("GS") amplification (see, e.g., US 5,122,464 and EP 338,841). 
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In a preferred embodiment a pair of expression vectors are used for expressing the polypep- 
tide i) of the invention or constituting part of a conjugate of the invention. Each of the vectors 
of said pair is capable of transfecting an eukaryotic cell as described herein, and the vectors 
5 comprise nucleotide sequences encoding, respectively, a modified FSH-a as described herein 
and a wildtype FSH-P subunit, a modified FSH-P as described herein and a wildtype FSH-a 
subunit, or a modified FSH-a and a modified FSH-P as described herein. The use of a pair of 
vectors is, e.g., described in EP 211,894. 

io The recombinant vector may further comprise a DNA sequence enabling the vector to repli- 
cate in the host cell in question. An example of such a sequence (when the host cell is a 
mammalian cell) is the SV40 origin of replication. When the host cell is a yeast cell, suitable 
sequences enabling the vector to replicate are the yeast plasmid 2u replication genes REP 1-3 
and origin of replication. 

15 

The vector may also comprise a selectable marker, e.g. a gene whose product complements a 
defect in the host cell, such as the gene coding for dihydrofolate reductase (DHFR) or the 
Schizosaccharomyces pombe TPI gene (described by P.R. Russell, Gene 40, 1985, pp. 125- 
130), or one which confers resistance to a drug, e.g. ampicillin, kanamycin, tetracyclic 
20 chloramphenicol, neomycin, hygromycin or methotrexate. For Saccharomyces cerevisiae, 
selectable markers include ura3 and leu2. For filamentous fungi, selectable markers include 
amdS, pyrG, arcB, niaD and sC. 

The term "control sequences" is defined herein to include all components which are neces- 
25 sary or advantageous for the expression of the polypeptide of the invention. Each control se- 
quence may be native or foreign to the nucleic acid sequence encoding the polypeptide. Such 
control sequences include, but are not limited to, a leader sequence, polyadenylation se- 
quence, propeptide sequence, promoter, enhancer or upstream activating sequence, signal 
peptide sequence, and transcription terminator. At a minimum, the control sequences include 
30 a promoter. 

A wide variety of expression control sequences may be used in the present invention. Such 
useful expression control sequences include the expression control sequences associated with 
structural genes of the foregoing expression vectors as well as any sequence known to control 
35 the expression of genes of prokaryotic or eukaryotic cells or their viruses, and various com- 
binations thereof. 

Examples of suitable control sequences for directing transcription in mammalian cells include 
the early and late promoters of SV40 and adenovirus, e.g. the adenovirus 2 major late pro- 

40 moter, the MT-1 (metallothionein gene) promoter, the human cytomegalovirus immediate- 
early gene promoter (CMV), the human elongation factor la (EF-la) promoter, the Droso- 
phila minimal heat shock protein 70 promoter, the Rous Sarcoma Virus (RSV) promoter, the 
human ubiquitin C (UbC) promoter, the human growth hormone terminator, SV40 or adeno- 
virus Elb region polyadenylation signals and the Kozak consensus sequence (Kozak, M. / 

45 Mol Biol 1987 Aug 20; 196(4):947-50). 

In order to improve expression in mammalian cells a synthetic intron may be inserted in the 
5' untranslated region of the nucleotide sequence encoding the polypeptide. An example of a 
synthetic intron is the synthetic intron from the plasmid pCI-Neo (available from Promega 
so Corporation, WI, USA). 
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Examples of suitable control sequences for directing transcription in insect cells include the 
polyhedrin promoter, the P10 promoter, the Autographa californica polyhedrosis virus basic 
protein promoter, the baculovirus immediate early gene 1 promoter and the baculovirus 39K 
delayed-early gene promoter, and the SV40 polyadenylation sequence. Examples of suitable 
control sequences for use in yeast host cells include the promoters of the yeast ct-mating sys- 
tem, the yeast triose phosphate isomerase (TPI) promoter, promoters from yeast glycolytic 
genes or alcohol dehydrogenase genes, the ADH2-4c promoter, and the inducible GAL pro- 
moter. Examples of suitable control sequences for use in filamentous fungal host cells include 
the ADH3 promoter and terminator, a promoter derived from the genes encoding Aspergillus 
oryzae TAKA amylase triose phosphate isomerase or alkaline protease, an A. niger a- 
amylase, A. niger or A. nidulans glucoamylase, A. nidulans acetamidase, Rhizomucor miehei 
aspartic proteinase or lipase, the TPI1 terminator and the ADH3 terminator. Examples of 
suitable control sequences for use in bacterial host cells include promoters of the lac system, 
the trp system, the TAC or TRC system, and the major promoter regions of phage lambda. 

The presence or absence of a signal peptide will, e.g., depend on the expression host cell 
used for the production of the polypeptide to be expressed (whether it is an intracellular or 
extracellular polypeptide) and whether it is desirable to obtain secretion. For use in filamen- 
tous fungi, the signal peptide may conveniently be derived from a gene encoding an Aspergil- 
lus sp. amylase or glucoamylase, a gene encoding a Rhizomucor miehei lipase or protease or 
a Humicola lanuginosa lipase. The signal peptide is preferably derived from a gene encoding 
A. oryzae TAKA amylase, A. niger neutral a-amylase, A. niger acid-stable amylase, or A. 
niger glucoamylase. For use in insect cells, the signal peptide may conveniently be derived 
from an insect gene (cf. WO 90/05783), such as the Lepidopteran manduca sexta adipoki- 
netic hormone precursor, (cf. US 5,023,328), the honeybee melittin (Invitrogen), ecdysteroid 
UDPglucosyltransferase (egt) (Murphy et al., Protein Expression and Purification 4, 349-357 
(1993) or human pancreatic lipase (hpl) (Methods in Enzymology 284, pp. 262-272, 1997). A 
preferred signal peptide for use in mammalian cells is that of hFSH or the murine Ig kappa 
light chain signal peptide (Coloma, M (1992) J. Imm. Methods 152:89-104). For use in yeast 
cells suitable signal peptides have been found to be the a-factor signal peptide from S. cerevi- 
ciae (cf. US 4,870,008), a modified carboxypeptidase signal peptide (cf. L.A. Vails et aL, 
Cell 48, 1987, pp. 887-897), the yeast BAR1 signal peptide (cf. WO 87/02670), the yeast 
aspartic protease 3 (YAP3) signal peptide (cf. M. Egel-Mitani et al.. Yeast 6, 1990, pp. 127- 
137), and the synthetic leader sequence TA57 (W098/32867). For use in E, coli cells a suit- 
able signal peptide have been found to be the signal peptide ompA (EP581821). 

The nucleotide sequence of the invention encoding a polypeptide exhibiting FSH activity, 
whether prepared by site-directed mutagenesis, synthesis, PCR or other methods, may op- 
tionally also include a nucleotide sequence that encodes a signal peptide. The signal peptide is 
present when the polypeptide is to be secreted from the cells in which it is expressed. Such 
signal peptide, if present, should be one recognized by the cell chosen for expression of the 
polypeptide. The signal peptide may be homologous (e.g. be that normally associated with a 
hFSH subunit) or heterologous (i.e. originating from another source than hFSH) to the poly- 
peptide or may be homologous or heterologous to the host cell, i.e. be a signal peptide nor- 
mally expressed from the host cell or one which is not normally expressed from the host cell. 
Accordingly, the signal peptide may be prokaryotic, e.g. derived from a bacterium such as E. 
coli, or eukaryotic, e.g. derived from a mammalian, or insect or yeast cell. 
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Any suitable host may be used to produce the polypeptide or polypeptide part of the conju- 
gate of the invention, including bacteria, fungi (including yeasts), plant, insect, mammal, or 
other appropriate animal cells or cell lines, as well as transgenic animals or plants. Examples 
of bacterial host cells include gram-positive bacteria such as strains of Bacillus, e.g. B. brevis 

5 or B. subtilis, Pseudomonas or Streptomyces, or gram-negative bacteria, such as strains of E. 
coli. The introduction of a vector into a bacterial host cell may, for instance, be effected by 
protoplast transformation (see, e.g., Chang and Cohen, 1979, Molecular General Genetics 
168: 111-115), using competent cells (see, e.g., Young and Spizizin, 1961, Journal of Bacte- 
riology 81: 823-829, or Dubnau and Davidoff-Abelson, 1971, Journal of Molecular Biology 

io 56: 209-221), electroporation (see, e.g., Shigekawa and Dower, 1988, Biotechniques 6: 742- 
751), or conjugation (see, e.g., Koehler and Thome, 1987, Journal of Bacteriology 169: 
5771-5278). Examples of suitable filamentous fungal host cells include strains of Aspergillus, 
e.g. A. oryzae, A. niger, or A. nidulans, Fusarium or Trichoderma. Fungal cells may be 
transformed by a process involving protoplast formation, transformation of the protoplasts, 

is and regeneration of the cell wall in a manner known per se. Suitable procedures for trans- 
formation of Aspergillus host cells are described in EP 238 023 and US 5,679,543. Suitable 
methods for transforming Fusarium species are described by Malardier et al. , 1989, Gene 78: 
147-156 and WO 96/00787. Examples of suitable yeast host cells include strains of Sac- 
charomyces, e.g. S. cerevisiae, Schizosaccharomyces, Klyveromyces, Pichia, such as P. pas- 

20 toris or P. methanolica, Hansenula, such as H. Polymorpha or Yarrowia. Yeast may be 
transformed using the procedures described by Becker and Guarente, In Abelson, J.N. and 
Simon, M.I., editors, Guide to Yeast Genetics and Molecular Biology, Methods in Enzymol- 
ogy, Volume 194, pp 182-187, Academic Press, Inc., New York; Ito et al., 1983, Journal of 
Bacteriology 153: 163; Hinnene/a/., 1978, Proceedings of the National Academy of Sciences 

25 USA 75: 1920: and as disclosed by Clontech Laboratories, Inc, Palo Alto, CA, USA (in the 
product protocol for the Yeastmaker™ Yeast Transformation System Kit). Examples of suit- 
able insect host cells include a Lepidoptora cell line, such as Spodoptera frugiperda (Sf9 or 
Sf21) or Trichoplusioa ni cells (High Five) (US 5,077,214). Transformation of insect cells 
and production of heterologous polypeptides therein may be performed as described by Invi- 

30 trogen. Examples of suitable mammalian host cells include Chinese hamster ovary (CHO) 
cell lines, (e.g. CHO-K1; ATCC CCL-61), Green Monkey cell lines (COS) (e.g. COS 1 
(ATCC CRL-1650), COS 7 (ATCC CRL-1651)); mouse cells (e.g. NS/O), Baby Hamster 
Kidney (BHK) cell lines (e.g. ATCC CRL-1632 or ATCC CCL-10), and human cells (e.g. 
HEK 293 (ATCC CRL-1573)), as well as plant cells in tissue culture. Additional suitable cell 

35 lines are known in the art and available from public depositories such as the American Type 
Culture Collection, Rockville, Maryland. Methods for introducing exogeneous DNA into 
mammalian host cells include calcium phosphate-mediated transfection, electroporation, 
DEAE-dextran mediated transfection, liposome-mediated transfection, viral vectors and the 
transfection method described by Life Technologies Ltd, Paisley, UK using Lipofectamin 

40 2000. These methods are well known in the art and e.g. described by Ausbel et al. (eds.), 
1996, Current Protocols in Molecular Biology, John Wiley & Sons, New York, USA. The 
cultivation of mammalian cells are conducted according to established methods, e.g. as dis- 
closed in (Animal Cell Biotechnology, Methods and Protocols, Edited by Nigel Jenkins, 
1999, Human Press Inc, Totowa, New Jersey, USA and Harrison MA and Rae IF, General 

45 Techniques of Cell Culture, Cambridge University Press 1997). 

In the production methods of the present invention, the cells are cultivated in a nutrient me- 
dium suitable for production of the polypeptide using methods known in the art. For exam- 
ple, the cell may be cultivated by shake flask cultivation, small-scale or large-scale fermenta- 
50 tion (including continuous, batch, fed-batch, or solid state fermentations) in laboratory or 
industrial fermenters performed in a suitable medium and under conditions allowing the poly- 
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peptide to be expressed and/or isolated. The cultivation takes place in a suitable nutrient me- 
dium comprising carbon and nitrogen sources and inorganic salts, using procedures known in 
the art. Suitable media are available from commercial suppliers or may be prepared accord- 
ing to published compositions (e.g. in catalogues of the American Type Culture Collection). 
If the polypeptide is secreted into the nutrient medium, it can be recovered directly from the 
medium. If the polypeptide is not secreted, it can be recovered from cell lysates. 

The resulting polypeptide may be recovered by methods known in the art. For example, it 
may be recovered from the nutrient medium by conventional procedures including, but not 
limited to, centrifugation, filtration, extraction, spray drying, evaporation, or precipitation. 

The polypeptides may be purified by a variety of procedures known in the art including, but 
not limited to, chromatography (e.g. ion exchange, affinity, hydrophobic, chromatofocusing, 
and size exclusion), electrophoretic procedures (e.g. preparative isoelectric focusing), differ- 
ential solubility (e.g. ammonium sulfate precipitation), SDS-PAGE, or extraction (see e.g. 
Protein Purification, J.-C. Janson and Lars Ryden, editors, VCH Publishers, New York, 
1989). Specific methods for purifying polypeptides exhibiting FSH activity have been de- 
scribed (Human Cytokines, Handbook of Basic and Clinical Research, Volume II, Blackwell 
Science, Eds. Aggarwal and Gutterman, 1996, pp. 19-42). 

Homogeneous preparation of a conjugate of the invention 

In a further aspect the invention relates to a substantially homogeneous preparation of a con- 
jugate of the invention. In the present context a "substantially homogeneous preparation" is a 
preparation, typically in a suitable buffer, containing more than 50%, such as more than 75% 
and preferably more than 85%, or more than 90% identical conjugates, i.e. having the same 
degree and nature of conjugation. The substantially homogeneous preparation is conveniently 
obtained by ensuring that the polypeptide part of the conjugate contains the necessary number 
of attachment groups, located at the surface of the molecule in such a way that all attachment 
groups can be conjugated to the non-polypeptide moiety of choice, when the conjugation is 
performed in the presence of a molar excess of the non-polypeptide moiety relative to the 
polypeptide. Preferably, the non-polypeptide moiety to be used in this aspect of the invention 
is a polymer molecule. 

Pharmaceutical composition of the invention and its use 

In one aspect the polypeptide, the conjugate or the pharmaceutical composition according to 
the invention is used for the manufacture of a medicament for treatment of infertility or dis- 
eases associated with insufficient endogenous production of FSH. 

In another aspect the polypeptide, the conjugate or the pharmaceutical composition according 
to the invention is used in a method of treating an infertile mammal, in particular a human, 
comprising administering to the mammal in need thereof such polypeptide, conjugate or 
pharmaceutical composition. 

The polypeptide exhibiting FSH activity of the invention or the conjugate of the invention is 
administered at a dose approximately paralleling that employed in therapy with rhFSH such 
as Gonal-F® and Puregon®. However, due to the increased functional in vivo half-life of the 
conjugate of the invention the product should be administered less frequently and at a dose 
which provides a comparable effect to that obtained in current therapy. Accordingly, the ex- 
act dose to be administered depends on the circumstances, including the patient to be treated, 
the cause of infertility if known, the status of the ovaries, the patient's plasma FSH concen- 
tration prior to treatment, and the functional in vivo half-life of the product. Normally, in the 
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treatment of infertility the dose should be capable of stimulating follicle maturation, e.g. in- 
duce follicles to grow about 2 mm per day during a time period of 8-9 days. For instance, for 
a product having a functional in vivo half-life of 3-4 days, two doses should be given at least 
three days apart if a relatively stable plasma concentration is desired. Analogously, for a 
product having a functional in vivo half-life of about 6 days one dose may suffice during the 
entire stimulation period. 

The composition of the invention may be exceedingly advantageous when employed in a step- 
down protocol, i.e. a protocol where decreasing dosages of FSH are given during the stimu- 
lation period, but where use of the composition may provide exactly such a slowly decreasing 
plasma concentration of FSH. 

It will be apparent to those of skill in the art that an effective amount of a conjugate, prepara- 
tion or composition of the invention depends, inter alia, upon the disease, the dose, the ad- 
ministration schedule, whether the polypeptide or conjugate or composition is administered 
alone or in conjunction with other therapeutic agents, the serum half-life of the compositions, 
and the general health of the patient. Typically, an effective dose of the conjugate, prepara- 
tion or composition of the invention is sufficient to ensure development and maturation of 
follicles at a rate and to a degree compatible with that obtained using standard rhFSH such as 
Gonal-F® and Puregon®. 

A further contemplated advantage is that the more stable plasma concentration obtained with 
a composition of the invention results in a more efficient development and maturation of fol- 
licles, which subsequently may enable a higher pregnancy rate. 

The polypeptide or conjugate of the invention is preferably administered in a composition 
including a pharmaceutically acceptable carrier or excipient. "Pharmaceutically acceptable" 
means a carrier or excipient that does not cause any untoward effects in patients to whom it is 
administered. Such pharmaceutically acceptable carriers and excipients are well known in the 
art. 

The polypeptide or conjugate of the invention can be formulated into pharmaceutical compo- 
sitions by well-known methods. Suitable formulations are described by Remington's Pharma- 
ceutical Sciences by E.W. Martin (Mark Publ. Co., 16th Ed., 1980). 

The pharmaceutical composition of the polypeptide or conjugate of the invention may be for- 
mulated in a variety of forms, including liquids, e.g. ready-to-use solutions or suspensions, 
gels, lyophilized, or any other suitable form, e.g. powder or crystals suitable for preparing a 
solution. The preferred form will depend upon the particular indication being treated and will 
be apparent to one of skill in the art. 

The pharmaceutical composition containing the polypeptide or conjugate of the invention may 
be administered intravenously, intramuscularly, intraperitoneally, intradermally, subcutane- 
ously, sublingualy, buccally, intranasally, transdermally, by inhalation, or in any other ac- 
ceptable manner, e.g. using Powder Ject® or ProLease® technology or a pen injection sys- 
tem. The preferred mode of administration will depend upon the particular indication being 
treated and will be apparent to one of skill in the art. In particular, it is advantageous that the 
composition be administered subcutaneously, since this allows the patient to conduct the ad- 
ministration his-/herself. 
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The pharmaceutical composition of the invention may be administered in conjunction with 
other therapeutic agents. These agents may be incorporated as part of the same pharmaceuti- 
cal composition or may be administered separately from the polypeptide or conjugate of the 
invention, either concurrently or in accordance with any other acceptable treatment schedule. 
5 In addition, the polypeptide, conjugate or pharmaceutical composition of the invention may 
be used as an adjunct to other therapies. 

By obtaining a more stable FSH plasma concentration just above the threshold level for folli- 
cle growth, the composition of the invention is of particular interest for the treatment of 
10 women suffering from anovulation WHO type I, II or in, since only 1-2 mature follicles are 
desired in these patients. 

Furthermore, the invention relates to the use of a composition of the invention in a step-down 
protocol where a decreasing plasma FSH concentrations are obtained using only one injec- 
15 tion, to the use of a composition of the invention in a step-up protocol where an increase in 
FSH concentrations is obtained faster using a lower individual as well as total dosage, and to 
the use of a composition of the invention in combination with compounds for in vitro matura- 
tion (sterol derivatives such as FF-MAS and media containing growth and maturation factors 
known in the art). 

20 

Mixtures of FSH and LH activities (hMG) are routinely used in the treatment of human infer- 
tility. This particular combination therapy may be advantageous because gonadal support of 
gamete maturation is dependent upon the synergistic actions of both FSH and LH. Current 
treatment protocols requiring FSH and LH activity utilize urinary extracts from postmeno- 
25 pausal women. The use of these extracts is compromised by several factors, including vari- 
ability. 

It will in some cases be advantageous to administer the composition of the invention as part 
of a treatment protocol that also involves LH and/or hCG, for example recombinant LH 
30 and/or hCG. This may in particular be useful for treatment of women with low endogenous 
LH levels. Finally, the composition of the invention may be used, possibly in combination 
with LH, in the treatment of male infertility, in particular of hypogonadotrophic hypo- 
gonadism and oligo- or azoospermia. The more stable plasma concentration obtained with a 
composition of the invention may lead to a more efficient spermatogenesis. 

35 

The present invention will be further illustrated by the following non-limiting examples and 
methods. 



40 MATERIALS AND METHODS 
Sequence numbering 

The amino acid sequence of hFSH-a is numbered according to the mature sequence shown in 
SEQ ID NO 2; an (a) suffix herein indicates the a chain. The amino acid sequence of hFSH- 
45 p is numbered according to the mature sequence shown in SEQ ID NO 4; a (b) suffix herein 
indicates the p chain. 

Structures 

HFSH-a is identical to the a chain of Human Chorionic Gonadotropin (HCG) for which two 
50 published structures are available: Wu, H., Lustbader, J. W., Liu, Y., Canfield, R. E., Hen- 
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drickson, W. A.: Structure 2 pp. 545 (1994) and Lapthom, A. J., Harris, D. C, Littlejohn, 
A., Lustbader, J. W., Canfield, R. E., Machin, K. J., Morgan, F. J., Isaacs, N. W.: Nature 
369 pp. 455 (1994), both including the P chain of HCG. The p chain of hFSH is 32 percent 
identical to the amino acid sequence of the structural part of the p chain of HCG (see the se- 

5 quence alignment of Figure 1). A series of 50 models of the 3D structure of FSH was build 
based on the above two available hCG structures and based on the sequence alignment in 
Figure 1 using the program Modeller 98 (MSI INC, 1999). The four N-terminal residues 
(Al(a), P2(a), D3(a) and V4(a) as well as the three C-terminal residues (H90(a), K91(a) and 
S92(a) were not modelled as they are not identified in the HCG structures. All of the hFSH-p 

10 chain was modelled, even the part which has no homologous residues in the HCG structures. 

Accessible Surface Area (ASA) 

The computer program Access (B. Lee and F.M.Richards, J. Mol.Biol. 55: 379-400 (1971)) 
version 2 (Copyright (c) 1983 Yale University) was used to compute the accessible surface 
is area (ASA) of the individual atoms in the structure. This method typically uses a probe-size 
of 1.4A and defines the Accessible Surface Area (ASA) as the area formed by the centre of 
the probe. Prior to this calculation all water molecules and all hydrogen atoms should be re- 
moved from the coordinate set, as should other atoms not directly related to the protein. 

20 Fractional ASA of side chain 

The fractional ASA of the side chain atoms is computed by division of the sum of the ASA of 
the atoms in the side chain with a value representing the ASA of the side chain atoms of that 
residue type in an extended Ala-x-Ala tripeptide, see Hubbard, Campbell & Thornton (1991) 
J.Mol. Biol. 220,507-530. For this example the CA atom is regarded as being a part of the 

25 side chain of glycine residues but not other residues. The following values are used as stan- 
dard 100% ASA for the side chain: 



Ala 


69.23 


A 2 


Arg 


200.35 


A 2 


Asn 


106.25 


A 2 


Asp 


102.06 


A 2 


Cys 


96.69 


A 2 


Gin 


140.58 


A 2 


Glu 


134.61 


A 2 


Gly 


32.28 


A 2 


His 


147.00 


A 2 


He 


137.91 


A 2 


Leu 


140.76 


A 2 


Lys 


162.50 


A 2 


Met 


156.08 


A 2 


Phe 


163.90 


A 2 


Pro 


119.65 


A 2 


Ser 


78.16 


A 2 


Thr 


101.67 


A 2 


Trp 


210.89 


A 2 


Tyr 


176.61 


A 2 


Val 


114.14 


A 2 
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Determination of surface exposed residues from structural models: 
Surface accessibility and fractional ASA of side chains were calculated for each of the 50 
model structures. The average value over the structural ensemble was used in the following. 
The N- and C-terminal residues of the FSH-a chain not included in the model are defined as 
having 100% side chain accessibility. 

The following amino acid residues in hFSH-a and hFSH-p, respectively, have more than 
25% of their side chain exposed to the surface: 

Al(a), P2(a), D3(a), V4(a), Q5(a), D6(a), P8(a), E9(a), Tll(a), L12(a), Q13(a), E14(a), 
P16(a), F17(a), Q20(a), P21(a), G22(a), A23(a), P24(a), L26(a), M29(a), F33(a), R42(a), 
S43(a), K44(a), K45(a), T46(a), L48(a), V49(a), Q50(a), N52(a), V61(a), K63(a), S64(a), 
Y65(a), N66(a), R67(a), V68(a), T69(a), M71(a), G72(a), G73(a), F74(a), K75(a), N78(a), 
T80(a), A81(a), H83(a), C84(a), S85(a), T86(a), Y88(a), Y89(a), H90(a), K91(a), S92(a), 
Nl(b), S2(b), E4(b), L5(b), T6(b), N7(b), 18(b), T9(b), K14(b), E15(b), E16(b), R18(b), 
F19(b), 121(b), S22(b), N24(b), Y31(b), Y33(b), R35(b), D36(b), L37(b), Y39(b), K40(b), 
D41(b), P42(b), A43(b), R44(b), P45(b), K46(b), 147(b), K49(b), K54(b), E55(b), L56(b), 
V57(b), Y58(b), E59(b), T60(b), V61(b), R62(b), P64(b), G65(b), A67(b), H68(b), H69(b), 
D71(b), L73(b), Y74(b), T75(b), T80(b), Q81(b), H83(b), G85(b), K86(b), D88(b), S89(b), 
D90(b), S91(b), D93(b), T95(b), V96(b), R97(b), G98(b), L99(b), G100(b), Y103(b), 
S105(b), F106(b), G107(b), E108(b), M109(b), Kl 10(b), and El 11(b). 

The following amino acid residues have more than 50% of their side chain exposed to the 
surface: 

Al(a), P2(a), D3(a), V4(a), Q5(a), D6(a), P8(a), E9(a), Tll(a), Q13(a), E14(a), P16(a), 
F17(a), Q20(a), P21(a), G22(a), A23(a), K45(a), T46(a), L48(a), V49(a), Q50(a), N52(a), 
K63(a), S64(a), N66(a), R67(a), T69(a), G72(a), G73(a), K75(a), T86(a), Y89(a), H90(a), 
K91(a), S92(a), Nl(b), N7(b), T9(b), E15(b), E16(b), R18(b), F19(b), N24(b), Y33(b), 
D41(b), P42(b), A43(b), R44(b), P45(b), K46(b), 147(b), K54(b), E55(b), V57(b), Y58(b), 
E59(b), R62(b), P64(b), G65(b), A67(b), H68(b), H69(b), D71(b), L73(b), T75(b), Q81(b), 
H83(b), K86(b), D88(b), S89(b), D90(b), S91(b), T95(b), R97(b), G98(b), L99(b), G100(b), 
Y103(b), S105(b), F106(b), G107(b), E108(b), M109(b), Kl 10(b), and El 1 1(b). 

Determining distances between atoms 

The distance between atoms is most easily determined using molecular graphics software, 
e.g. Insightn v. 98.0, MSI Inc. 

Methods used to determine the in vitro and in vivo activity of rhFSH and variants thereof 
In vitro bioactivity 

The in vitro bioactivity of conjugates or polypeptides of the invention exhibiting FSH activity 
may be determined by an FSH receptor activation assay. A suitable assay is the CHO-luc 
assay described by Chappel et al., Human Reproduction, 1998, 13(3), pp 18-35. In brief, a 
culture of CHO cells expressing human FSH receptor (Kelton et al., 1992, Mol. Cell. Endo- 
cribol., 89, 141-151) and firefly luciferase is incubated with the polypeptide or conjugate to 
be tested, and the luminescence signal is measured by use of a Packard TopCounter or a 
similar luminescence reader. 

The bioactivity of the conjugates or polypeptides of the invention may also be measured using 
the CHO cell line expressing the hFSH receptor by determining the ability of the polypeptide 
or conjugate to elicit cAMP, using a standard cAMP assay, for instance SPA -based. 
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Alternatively, in vitro bioactivity may be determined by incubating Yl cells expressing the 
FSH receptor with the polypeptide or conjugate as described by Chappel et al., op cit. FSH 
receptor activation results in an increased production of progesterone, which can be measured 
5 by radioimmuno-assay, and a dose-response relationship is established between the amount of 
FSH added to the Yl cells and progesterone release. 

Alternatively, the ability of a polypeptide or conjugate of the invention to compete for the 
binding sites with hFSH is analyzed by incubating with a labeled FSH analog, for instance 
10 biotinylated hFSH or radioiodinated hFSH. 

The extracellular domains of the hFSH receptor can optionally be coupled to Fc and immobi- 
lized in 96 well plates. RhFSH or variants thereof are subsequently added and the binding of 
these detected using either specific anti-hFSH antibodies or biotinylated or radioiodinated 
is hFSH. 

Measurement of the in vivo half-life of conjugated and unconjugated rhFSH and variants 
thereof 

Measurement of functional in vivo half-life can be carried out in a number of ways as de- 
20 scribed in the literature. For instance, the ability of the conjugates or polypeptides of the in- 
vention given once to a laboratory animal to continue to stimulate the maturation of follicles 
may be detected with e.g. ultrasound equipment and compared to rhFSH. An indirect meas- 
ure would be to test the FSH bioactivity of plasma samples drawn at different timepoints 
from animals treated with the subject of the invention or rhFSH. The bioactivity could be 
25 measured using the above mentioned in vitro assays. 

Determination of the molecular size of hFSH and variants thereof 

The molecular weight of a conjugate or polypeptide of the invention is determined by SDS- 
PAGE, gel filtration, matrix assisted laser desorption mass spectrometry or equilibrium cen- 
30 trifugation 

Methods for PEGvlation of hFSH and variants thereof 

PEGylation in microtiter plates of a tagged polypeptide with FSH activity 
35 The polypeptide exhibiting FSH activity is expressed with a suitable tag, e.g. any of the tags 
exemplified in the general description above and transferring culture broth to one or more 
wells in a microtiter plate capable of immobilising the tagged polypeptide. When the tag is 
Met-Lys-His-Gln-His-Gln-His-Gln-His-Gln-His-Gln-His-Gln-Gln, a nickel-nitrilotriacetic 
acid (Ni-NTA) HisSorb microtiter plate commercially available from QiaGen can be used. 

40 

After allowing for immobilisation of the tagged polypeptide to the microtiter plate, the wells 
are washed in a buffer suitable for binding and subsequent PEGylation followed by incubat- 
ing the wells with the activated PEG of choice. As an example, M-SPA-5000 from Shear- 
water Polymers may be used. The molar ratio of activated PEG to polypeptide should be op- 

45 timised, but will typically be greater than 10: 1 more typically greater than 100: 1 . After a 
suitable reaction time at ambient temperature, typically around 1 hour, the reaction is stopped 
by removal of the activated PEG solution. The conjugated protein is eluted from the plate by 
incubation with a suitable buffer. Suitable elution buffers may contain imidazole, excess NTA 
or another chelating compound. The conjugated protein is assayed for biological activity and 

so immunogenicity as appropriate. The tag may optionally be cleaved off using a method known 
in the art, e.g. using diaminopeptidase, the Gin in pos -1 being converted to pyroglutamyl 
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with GCT (glutamylcyclotransferase) and finally cleaved off with PGAP (pyro-glutamyl- 
aminopeptidase), giving the protein. The process involves several steps of metal chelate affin- 
ity chromatography. Alternatively, the tagged polypeptide may be conjugated. 

PEGylation of a polypeptide exhibiting FSH activity and having a blocked receptor-binding 
site 

The following method can be used to optimize PEGylation of hFSH in a manner excluding 
PEGylation of lysines involved in receptor recognition. 

A homodimer complex consisting of an FSH polypeptide and the soluble domain of the FSH 
receptor in a 1:1 stoichiometry is formed in a PBS buffer at pH 7. The concentration of FSH 
polypeptide is approximately 20 jig/ml or 1 uM and the receptor is present at equimolar 
concentration. 

M-SPA-5000 from Shearwater Polymers, Inc. is added at 3 different concentration levels 
corresponding to a 5, 20 and 100 fold molar excess of FSH polypeptide. The reaction time is 
30 min at RT. After the 30 min reaction period, the pH of the reaction mixture is adjusted to 
2.0 and the reaction mixture is applied to a Vydac C18 column and eluted with an acetonitrile 
gradient essentially as described (Utsumi et al., J. Biochem., vol. 101, 1199-1208, 1987). 
Alternatively, and more elegantly, an isopropanol gradient can be used. 

Fractions are analyzed using the primary screening assay described herein and active PEGy- 
lated FSH polypeptide obtained by this method is stored at -80°C in PBS, pH 7 containing 1 
mg/ml human serum albumin (HSA). 

Strategy for preparing a conjugate of the invention comprising PEG 

rhFSH as well as all possible muteins of FSH comprising a single lysine to arginine substitu- 
tion are prepared and characterized with respect to specific activity as compared to rhFSH to 
establish which, if any, lysines are critical for activity of the molecule and which may be sub- 
stituted by arginine with an acceptable retention of activity. 

Subsequently, rhFSH and muteins thereof, namely muteins with inserted and/or deleted lysi- 
nes, are subjected to PEGylation by providing a surplus of SPA-PEG according to the proce- 
dure disclosed in WO 97/03106. Next, the specific activity of these variants is measured. 
Muteins permitting PEGylation with retention of acceptable activity are chosen for further 
work. 

The above strategy may be repeated with any other attachment group, for example acidic 
residue substitution and suitable PEGylation chemistry. Muteins permitting PEGylation with 
retention of acceptable activity are chosen for further work. 

The selected muteins are subjected to PEGylation with SPA-PEG according to WO 97/03106 
(or another suitable PEGylation chemistry for the chosen attachment group) while varying the 
molecular weight of the SPA-PEG. These molecules are controlled for continued retention of 
acceptable activity and subjected to characterization with respect to in vivo half-life according 
to the above protocol of the Materials and Methods section. Muteins with an increased in vivo 
half-life are selected and exemplify the invention disclosed and claimed herein. 
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EXAMPLE 1 

Extension of the N-terminus of the FSH-g subunit with additional elvcosvlation sites 
Construction of expression plasmids 

A gene encoding the human FSH-a subunit was constructed by assembly of synthetic oli- 
gonucleotides using PCR. The codon usage of the gene was optimised for high expression in 
mammalian cells. Furthermore, in order to achieve high gene expression, an intron (from 
pCI-Neo (Promega)) was included in the 5' untranslated region of the gene. The synthetic 
gene was subcloned behind the CMV promoter in pcDNA3.1/Hygro (Invitrogen). The se- 
quence of the resulting plasmid, termed pBvdH977, is given in Figure 2 (FSH-a-coding se- 
quence at position 1225 to 1572). Similarly, a synthetic gene encoding the wildtype human 
FSH-p subunit was constructed. Also in this construct codon usage was optimised for high 
expression and an intron was included in the recipient vector (pcDNA3.1/Zeo (Invitrogen)). 
The sequence of the resulting FSH-p-containing plasmid, termed pBvdH1022, is given in 
Figure 3 (FSH-p-coding sequence at position 1231 to 1617). A construct containing a modi- 
fied form of FSH-a having two additional sites at its N-terminus was generated by PCR. A 
DNA fragment encoding the sequence Ala-Asn-IIe-Thr-Val-Asn-Ile-Thr-Val was inserted 
immediately upstream of the mature FSH-a sequence in pBvdH977. The sequence of the re- 
sulting plasmid, termed pBvdH1163, is given in Figure 4 (modified FSH-a-coding sequence 
at position 1225 to 1599). 

Expression of wildtype FSH and an N-terminally a-modified form in CHO cells 
For expression of wildtype FSH, 6.25 ug of pBvdH977 and 6.25 ug of pBvdH1022 were co- 
transfected into Chinese Hamster Ovary (CHO) Kl cells (ATCC, CCL 61) using Lipofec- 
tamine 2000 (Life Technologies) according to the manufacturer's specifications. 40-48 hrs 
after transfection, culture media were collected for analysis in Western blot. For expression 
of the modified form of FSH containing two additional glycosylation sites at the N-terminus 
of the a subunit, 6,25 ug of pBvdH1163 and 6.25 ug of pBvdH1022 were co-transfected into 
CHO Kl, and culture media were collected 48 hrs after transfection, as for wildtype FSH. 

Analysis of wildtype FSH and an N-terminally a-modified form by Western blotting 
The FSH content of samples was analysed by Western blotting: Proteins were separated by 
SDS-PAGE, and a Western blot was performed using rabbit anti human FSH (AHP519, Se- 
rotec) as primary antibody, and an ImmunoPure Ultra Sensitive ABC Peroxidase Staining Kit 
(Pierce) for detection. FSH forms in the 1163 + 1022-derived sample migrated more slowly 
than the wildtype in the 977+1022-derived samples. This indicated that introduction of gly- 
cosylation sites at the N-terminus of the a subunit indeed leads to hyperglycosylation of FSH. 
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CLAIMS 

I. A polypeptide conjugate exhibiting FSH activity, comprising 

i) a polypeptide comprising FSH-a and FSH-P subunits, wherein at least one of said FSH-a 
5 and FSH-P subunits differs from the corresponding wildtype subunit in that at least one 

amino acid residue acid residue comprising an attachment group for a non-polypeptide moiety 
has been introduced or removed, and 

ii) a non-polypeptide moiety bound to an attachment group of said polypeptide. 

io 2. The conjugate according to claim 1, wherein the amino acid sequence of at least one of 
said FSH-a and FSH-P subunits differs from that of the corresponding wildtype subunit in 
that an amino acid residue comprising an attachment group for the non-polypeptide moiety 
has been removed from the sequence. 

15 3. The conjugate according to claim 1 or 2, wherein the amino acid sequence of at least one 
of said FSH-a and FSH-P subunits differs from that of the corresponding wildtype subunit in 
that an amino acid residue comprising an attachment group for the non-polypeptide moiety 
has been introduced into the sequence. 

20 4. The conjugate according to any of claims 1-3, wherein the amino acid sequence of FSH-a 
differs from that of the corresponding wildtype subunit. 

5. The conjugate according to any of claims 1-3, wherein the amino acid sequence of FSH-P 
differs from that of the corresponding wildtype subunit. 

25 

6. The conjugate according to any of claims 1-5, wherein the corresponding wildtype subunit 
is hFSH-a andfor hFSH-p. 

7. The conjugate according to any of claims 1-6, wherein the non-polypeptide moiety is a 
30 polymer molecule. 

8. The conjugate according to any of claims 1-7, wherein the polymer molecule is polyethyl- 
ene glycol. 

35 9. The conjugate according to any of claims 1-8, wherein the amino acid residue comprising 
an attachment group for the non-polypeptide moiety is selected from the group consisting of a 
lysine, asparagine, aspartic acid, glutamic acid, tyrosine and cysteine residue, preferably a 
lysine residue. 

40 10. The conjugate according to claim 9, which comprises a modified FSH-a having an amino 
acid sequence which differs from that of hFSH-a in the removal of at least one lysine residue 
selected from the group consisting of K44(a), K45(a), K51(a), K63(a), K75(a), and K91(a). 

II. The conjugate according to claim 9 or 10, which comprises a modified FSH-P having an 
45 amino acid sequence which differs from that of hFSH-P in the removal of at least one lysine 

residue selected from the group consisting of K14(b), K40(b), K46(b), K49(b), K54(b), 
K86(b), and Kl 10(b). 
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12. The conjugate according to any of claims 9-11, wherein the modified FSH-a and modi- 
fied FSH-p subunit differ from the corresponding hFSH subunit in at least one of K45(a), 
K63(a), K75(a), and K91(a), and at least one of K46(b), K54(b), K86(b), and Kl 10(b). 

13. The conjugate according to any of claims 6-12, wherein the polypeptide is glycosylated. 

14. The conjugate according to claim 13, wherein the amino acid sequence of at least one of 
FSH-a and FSH-P differs from that of the corresponding wildtype sequence in that an N- 
glycosylation site has been introduced and/or removed. 

15. A polypeptide conjugate exhibiting FSH activity comprising 

i) a polypeptide comprising FSH-a and FSH-p subunits, wherein the amino acid sequence of 
at least one of said FSH-a and FSH-p subunits differs from that of the corresponding wild- 
type subunit in that at least one N-glycosylation site has been introduced, and 

ii) an oligosaccharide moiety bound to an N-glycosylation site of said polypeptide. 

16. The conjugate according to claim 15, wherein the amino acid sequence of at least one of 
said FSH-a and FSH-p subunits further differs from that of the corresponding wildtype sub- 
unit in at least one naturally-occurring N-glycosylation site has been removed. 

17. The conjugate according to any of claims 13-16, wherein an N-glycosylation site has been 
introduced by a mutation selected from the group consisting of P2(a)N+V4(a)S, 
P2(a)N+V4(a)T, D3(a)N+Q5(a)S, D3(a)N+Q5(a)T, V4(a)N+D6(a)S, V4(a)N+D6(a)S, 
D6(a)N+P8(a)S, D6(a)N+P8(a)T, E9(a)N+Tll(a)S, E9(a)N, Tll(a)N+Q13(a)S, 
Tll(a)N+Q13(a)T, L12(a)N+E14(a)S, L12(a)N+E14(a)T, E14(a)N+P16(a)S, 
E14(a)N+P16(a)T, P16(a)N+F18(a)S, P16(a)N+F18(a)T, F17(a)N, F17(a)N+S19(a)T, 
G22(a)N+P24(a)S, G22(a)N+P24(a)T, P24(a)N+L26(a)S, P24(a)N+L26(a)T, 
F33(a)N+R35(a)S, F33(a)N+R35(a)T, R42(a)N+K44(a)S, R42(a)N+K44(a)T, 
S43(a)N+K45(a)S, S43(a)N+K45(a)T, K44(a)N+T46(a)S, K44(a)N, K45(a)N+M47(a)S, 
K45(a)N+M47(a)T, T46(a)N-t-L48(a)S, T46(a)N+L48(a)T, L48(a)N+Q50(a)S, 
148(a)N+Q50(a)T, V49(a)N+K51(a)S, V49(a)N+K51(a)T, Q50(a)N+N52(a)S, 
Q50(a)N+N52(a)T, V61(a)N+K63(a)S, V61(a)N+K63(a)T, K63(a)N+Y65(a)S, 
K63(a)N+Y65(a)T, S64(a)N+N66(a)S, S64(a)N+N66(a)T, Y65(a)N+R67(a)S, 
Y65(a)N+R67(a)T, V68(a)S, V68(a)T, R67(a)N+T69(a)S, R67(a)N, T69(a)N+M71(a)S, 
T69(a)N+M71(a)T, M71(a)N + G73(a)S, M71(a)N + G73(a)T, G72(a)N+F74(a)S, 
G72(a)N+F74(a)T, G73(a)N+K75(a)S, G73(a)N+K75(a)T, F74(a)N+V76(a)S, 
F74(a)N+V76(a)T, K75(a)N+E77(a)S, K75(a)N+E77(a)T, A81(a)N+H83(a)S, 
A81(a)N+H83(a)T, H83(a)N, T86(a)N+Y88(a)S, T86(a)N+Y88(a)T, Y88(a)N+H90(a)S, 
Y88(a)N+H90(a)T, Y89(a)N+K91(a)S, Y89(a)N+K91(a)T, H90(a)N and 
H90(a)N+S92(a)T. 

18. The conjugate according to any of claims 13-17, comprising a modified FSH-p having an 
amino acid sequence which differs from that of hFSH-P in the introduction of at least one N- 
glycosylation site by a mutation selected from the group consisting of S2(b)N+E4(b)S, 
S2(b)N+E4(b)T, E4(b)N+T6(b)S, E4(b)N, L5(b)N+N7(b)S, L5(b)N+L7(b)T, 
T6(b)N+I8(b)S, T6(b)N+I8(b)T, I8(b)N+I10(b)S, I8(b)N+I10(b)T, T9(b)N+All(b)S, 
T9(b)N+All(b)T, K14(b)N+E16(b)S, K14(b)N+E16(b)T, F19(b)N+I21(b)S, 
F19(b)N+I21(b)T, I21(b)N+I23(b)S, I21(b)N+I23(b)T, S22(b)N+N24(b)S, 
S22(b)N+N24(b)T, Y31(b)N+Y33(b)S, Y31(b)N+Y33(b)T, Y33(b)N+R35(b)S, 
Y33(b)N+R35(b)T, R35(b)N+L37(b)S, R35(b)N + L37(b)T, D36(b)N+V38(b)S, 
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D36(b)N+V38(b)T, L37(b)N+Y39(b)S, L37(b)N+Y39(b)T, K40(b)N+P42(b)S 
K40(b)N+P42(b)T, A43(b)N+P45(b)S, A43(b)N+P45(b)T, P45(b)N+I47(b)S 
P45(b)N+I47(b)T, K46(b)N+Q48(b)S, K46(b)N+Q48(b)T, I47(b)N+K49(b)S, 
I47(b)N+K49(b)T, K54(b)N+L56(b)S, K54(b)N+L56(b)T, E55(b)N+V57(b)S 
E55(b)N+V57(b)T, L56(b)N+Y58(b)S, L56(b)N+Y58(b)T, V57(b)N+E59(b)S, 
V57(b)N+E59(b)T, Y58(b)N+T60(b)S, Y58(b)N, E59(b)N+V61(b)S, E59(b)N+V61(b)T 
T60(b)N+R62(b)S, T60(b)N+R62(b)T, R62(b)N+P64(b)S, R62(b)N+P64(b)T 
G65(b)N+A67(b)S, G65(b)N+A67(b)T, A67(b)N+H69(b)S, A67(b)N+H69(b)T 
H68(b)N+A70(b)S, H68(b)N+A70(b)T, H69(b)N+D71(b)S, H69(b)N+D71(b)T 
D71(b)N+L73(b)S, D71(b)N+L73(b)T, L73(b)N+T75(b)S, L73(b)N, T75(b)N+P77(b)S 
T75(b)N+P77(b)T, H83(b)N+G85(b)S, H83(b)N+G85(b)T, K86(b)N+D88(b)S, 
K86(b)N+D88(b)T, D88(b)N+D90(b)S, D88(b)N+D90(b)T, S89(b)N, S89(b)N+S91(b)T 
D90(b)N+T92(b)S, D90(b)N, S91(b)N+D93(b)S, S91(b)N+D93(b)T, D93(b)N+T96(b)S * 
D93(b)N, T95(b)N+R97(b)S, T95(b)N+R97(b)T, V96(b)N+G98(b)S, V96(b)N+G98(b)T 
R97(b)N+L99(b)S, R97(b)N+L99(b)T, L99(b)N+P101(b)S, L99(b)N+P101(b)T 
Y103(b)N, Y103(b)N+S105(b)T, S105(b)N+G107(b)S, S105(b)N+G107(b)T, 
F106(b)N+E108(b)S, F106(b)N+E108(b)T, G107(b)N+M109(b)S, G107(b)N+M109(b)T 
E108(b)N+K110(b)S, E108(b)N+K110(b)T, M109(b)N+Elll(b)S, and 
M109(b)N+Elll(b)T. 

19. The conjugate according to any of claims 13-18, wherein a naturally occurring glycosyla- 
tion site has been removed from FSH-a and/or FSH-p\ 

20. The conjugate according to any of claims 1-19, wherein the amino acid sequence of FSH- 
cc and/or FSH-p differs in 1-15 amino acid residues from the corresponding wildtype se- 
quence. 

21. The conjugate according to any of claims 1-20, which comprises at least one further mu- 
tation in FSH-a and/or FSH-p, said mutation being neither an introduction nor a removal of 
an amino acid residue comprising an attachment group for the non-polypeptide moiety. 

22. The conjugate according to any of claims 15-21, which further comprises a non- 
polypeptide moiety different from an N- or O-l inked carbohydrate moiety. 

23. The conjugate according to any of the preceding claims, which has reduced renal clear- 
ance as compared to hFSH. 

24. The conjugate according to any of the preceding claims, which has an increased func- 
tional in vivo half-life and/or serum half-life as compared to hFSH. 

25. The conjugate according to any of claims 1-24, comprising a sufficient number or type of 
non-polypeptide moieties to render the conjugate less susceptible to renal clearance than 
hFSH. 

26. The conjugate according to claim 25, wherein at least one of the non-polypeptide moieties 
is a polymer molecule. 

27. The conjugate according to any of claims 1-26, which has a molecular weight of at least 
about 67 kDa, in particular at least about 70 kDa. 
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28. The conjugate according to any of claims 23-27, said conjugate being according to claim 
1 or 2 having an oligosaccharide moiety as the only type of non-polypeptide moiety and hav- 
ing at least one removed N-glycosylation site, but no introduced N-glycosylation site. 

29. A substantially homogeneous preparation of a conjugate according to any of claims 1-28. 

30. FSH-a which has an amino acid sequence that differs from that of the corresponding 
wildtype FSH-a subunit in that at least one amino acid residue comprising an attachment 
group for a polymer molecule has been introduced and/or removed. 

31. FSH-p which has an amino acid sequence that differs from that of the corresponding 
wildtype FSH-p subunit in that at least one amino acid residue comprising an attachment 
group for a polymer molecule has been introduced and/or removed. 

32. The FSH subunit according to claim 30 or 31, wherein a non-naturally occurring N- 
glycosylation site has been introduced. 

33. The FSH subunit according to claim 32, wherein a naturally-occurring N-glycosylation 
site has been removed. 

34. The FSH subunit according to any of claims 30-33, which is glycosylated. 

35. A nucleotide sequence encoding a polypeptide according to any of claims 30-34. 

36. An expression vector harbouring a nucleotide sequence according to 
claim 35. 

37. A pair of expression vectors, each vector being capable of transfecting a eukaryotic cell, 
the vectors comprising nucleotide sequences encoding, respectively, FSH-a according to 
claim 35 and a wildtype FSH-P subunit, FSH-p according to claim 35 and a wildtype FSH-a 
subunit, or FSH-a according to claim 35 and FSH-p according to claim 35. 

38. A host cell comprising a nucleotide sequence according to claim 35, an expression vector 
according to claim 36, or a pair of expression vectors according to claim 37. 

39. The host cell according to claim 38, which is a eukaryotic cell. 

40. The host cell according to claim 39, which is a mammalian cell. 

41. A method for producing a modified FSH subunit according to any of claims 30-34, which 
method comprises subjecting the cell according to any of claims 38-40 comprising a nucleo- 
tide sequence encoding said modified subunit to cultivation under conditions conducive for 
expression of the subunit, and optionally recovering the subunit. 

42. The method according to claim 41, which further comprises subjecting the subunit to 
conjugation to a non-polypeptide moiety so as to produce a conjugate according to any of 
claims 1-28 or a preparation according to claim 29. 

43. The method according to claim 42, wherein the non-polypeptide moiety is a polymer 
molecule and the conjugation is performed in the presence of a molar excess of the polymer 
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moiety relative to the polypeptide, whereby a substantially homogeneous preparation of con- 
jugates is obtained. 

44. A method for increasing the functional in vivo half-life and/or serum half-life of a poly- 
peptide exhibiting FSH activity, which method comprises introducing an amino acid residue 
change as defined in any of claims 1-28 and subjecting the resulting modified polypeptide to 
conjugation with an appropriate non-polypeptide moiety. 

45. A method for preparing a conjugate according to any of claims 1-28, comprising provid- 
ing a polypeptide i) and a non-polypeptide moiety ii), allowing the polypeptide to react with 
the non-polypeptide moiety under conditions conducive for conjugation to take place, and 
recovering the resulting conjugate. 

46. The method according to any of claims 41-45, wherein conjugation to the non- 
polypeptide moiety is conducted in the presence of a molar excess of the non-polypeptide 
moiety relative to the polypeptide, whereby a substantially homogenous conjugate preparation 
is obtained. 

47. A method for preparing a polypeptide exhibiting FSH activity comprising a modified 
FSH-a subunit according to any of claims 30 or 32-34 and a wildtype FSH (5-subunit, a 
modified FSH-P subunit according to any of claims 31-34 and a wildtype FSH-a subunit, or 
a modified FSH-a subunit according to any of claims 30 or 32-34 and a modified FSH-P 
subunit according to any of claims 31-34, which method comprises producing the respective 
subunits separately and allowing the subunits to dimerize. 

48. The method according to claim 47, which further comprises subjecting the resulting 
dimeric polypeptide to conjugation with a non-polypeptide moiety. 

49. A pharmaceutical composition comprising a) a conjugate according to any of claims 1-28 
or a preparation according to claim 29, and b) a pharmaceutically acceptable diluent, carrier 
or adjuvant. 

50. A conjugate according to any of claims 1-28, a preparation according to claim 29, or a 
composition according to claim 49 for use in the treatment of infertility. 

51. Use of a conjugate according to any of claims 1-28, a preparation according to claim 29, 
or a composition according to claim 49 for the treatment of infertility. 

52. Use of a conjugate according to any of claims 1-28, a preparation according to claim 22, 
or a composition according to claim 49 for the manufacture of a medicament for treatment of 
infertility. 

53. A method of treating an infertile mammal comprising administering to a mammal in need 
thereof an effective amount of a conjugate according to any of claims 1-28, a preparation 
according to claim 22, or a composition according to claim 35. 

54. A polypeptide conjugate exhibiting FSH activity, comprising a polypeptide comprising 
FSH-a and FSH-P subunits, wherein at least one of said FSH-a and FSH-P subunits com- 
prises a polymer molecule bound to the N-terminal thereof. 
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55. The polypeptide of claim 54, wherein the polymer molecule is polyethylene glycol. 

56. A polypeptide conjugate exhibiting FSH activity, comprising a polypeptide comprising 
FSH-a and FSH-0 subunits, wherein at least one of said FSH-a and FSH-P subunits 
comprises at least one introduced N- or O-glycosylation site at the N-terminal thereof, said at 
least one introduced glycosylation site being glycosylated. 

57. The polypeptide conjugate of any of claims 54-56, wherein the FSH-a subunit comprises 
hFSH-a having the sequence shown in SEQ ID NO 2 and/or the FSH-P subunit comprises 
hFSH-p having the sequence shown in SEQ ID NO 4. 

58. The polypeptide conjugate of claim 54 or 55, said conjugate further being as defined in 
any of claims 1-27. 

59. The polypeptide conjugate of claim 56, said conjugate further being as defined in any of 
claims 1-12 or 16-27. 
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SEQUENCE LISTING 



SEQ ID NO 1 

5 

The complete amino acid sequence of the common a chain, named "Glycoprotein hormones a 
chain" Fiddes J.C., Goodman H.M. "Isolation, cloning and sequence analysis of the cDNA for 
the cc-subunit of human chorionic gonadotropin." Nature 281:351-356(1979). 

10 MDYYRKYAAI FLVTLSVFLH VLHSAPDVQD CPECTLQENP FFSQPGAPIL 
QCMGCCFSRA YPTPLRSKKT MLVQKNVTSE STCCVAKSYN RVTVMGGFKV 
ENHTACHCST CYYHKS 

Rathnam P., Saxena B.B.; "Primary amino acid sequence of follicle-stimulating hormone from 
15 human pituitary glands. I. a subunit." J. Biol. Chem. 250:6735-6746(1975). Reports residue 
Q29 to be a Glu. 

Sairam M.R., Li C.H. "Human pituitary thyrotropin. The primary structure of the a and beta 
subunits." Can. J. Biochem. 55:755-760(1977), and Sairam M.R., Papkoff H. t Li C.H. "Human 
20 pituitary interstitial cell stimulating hormone: primary structure of the a-subunit." Biochem. 
Biophys. Res. Commun. 48:530-537(1972) report the sequence CS at positions 108-109 to be 
the sequence SC. 

SEQ ID NO 2 

25 The mature amino acid sequence of the common a chain shown in SEQ ID NO 1 . 

APDVQDCPEC TLQENPFFSQ PGAPILQCMG CCFSRAYPTP LRSKKTMLVQ 
KNVTSESTCC VAKSYNRVTV MGGFKVENHT ACHCSTCYYH KS 

30 SEQ ID NO 3 

The complete amino acid sequence of Human FSH P chain, Tanzi R.E., Gusella J.F., Shows 
T.B. "DNA sequence and regional assignment of the human follicle-stimulating hormone beta- 
subunit gene to the short arm of human chromosome 11." DNA 6:205-212(1987). 

35 

MKTLQFFFLF CCWKAICCNS CELTNITIAI EKEECRFCIS INTTWCAGYC 
YTRDLVYKDP ARPKIQKTCT FKELVYETVR VPGCAHHADS LYTYPVATQC 
HCGKCDSDST DCTVRGLGPS YCSFGEMKE 

40 SEQ ID NO 4 

The mature sequence of Human FSH shown in SEQ ID NO 3. 

NSCELTNITI AIEKEECRFC ISINTTWCAG YCYTRDLVYK DPARPKIQKT 
45 CTFKELVYET VRVPGCAHHA DSLYTYPVAT QCHCGKCDSD STDCTVRGLG 
PSYCSFGEMK E 
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FIGURE 1 

Sequence alignments: 

5 Sequence alignment of Human FSH to the structural part of the two structures of Human Chori- 
onic Gonadotropin. The "/" indicates the chain break between the alpha and the beta chain. 

FSH -QDCPECTLQ ENPFFSQPGA PILQCMGCCF SRAYPTPLRS KKTMLVQKNV 
1HRP TQDCPECTLQ ENPFFSQPGA PILQCMGCCF SRAYPTPLRS KKTMLVQKNV 
10 1HCN -QDCPECTLQ ENPFFSQPGA PILQCMGCCF SRAYPTPLRS KKTMLVQKNV 



FSH TSESTCCVAK SYNRVTVMGG 

1HRP , TSESTCCVAK SYNRVTVMGG 

1HCN TSESTCCVAK SYNRVTVMGG 

FSH TIAIEKEECR FCISINTTWC 

1HRP TLAVEKEGCP VCITVNTTIC 

1HCN TLAVEKEGCP VCITVNTTIC 



FKVENHTACH CSTCYY/ --NSCELTNI 

FKVENHTACH CSTCYY/KEP LRPRCRPINA 
FKVENHTACH CSTCYY/KEP LRPRCRPINA 

AGYCYTRDLV YKDPARPKIQ KTCTFKELVY 
AG YC PTMTRV LQGVLPALPQ WCNYRDVRF 
AGYCPTMTRV LQGVLPALPQ WCNYRDVRF 



20 FSH ETVRVPGCAH HADSLYTYPV ATQCHCGKCD SDSTDCTVRG LGPSYCSFGE 
1HRP ESIRLPGCPR GVNPWSYAV ALSCQCALCR RSTTDCGGPK DHPLTCD. . . 
1HCN ESIRLPGCPR GVNPWSYAV ALSCQCALCR RSTTDCGGPK DHPLTCD. . . 



25 



FSH 

1HRP 

1HCN 



FIGURE 2 (p. 1/5) 



1 GACGGATCGG GAGATCTCCC GATCCCCTAT GGTCGACTCT CAGTACAATC 

CTGCCTAGCC CTCTAGAGGG CTAGGGGATA CCAGCTGAGA GTCATGTTAG 

51 TGCTCTGATG CCGCATAGTT AAGCCAGTAT CTGCTCCCTG CTTGTGTGTT 
ACGAGACTAC GGCGTATCAA TTCGGTCATA GACGAGGGAC GAACACACAA 

101 GGAGGTCGCT GAGTAGTGCG CGAGCAAAAT TTAAGCTACA ACAAGGCAAG 
CCTCCAGCGA CTCATCACGC GCTCGTTTTA AATTCGATGT TGTTCCGTTC 

151 GCTTGACCGA CAATTGCATG AAGAATCTGC TTAGGGTTAG GCGTTTTGCG 
CGAACTGGCT GTTAACGTAC TTCTTAGACG AATCCCAATC CGCAAAACGC 

201 CTGCTTCGCG ATGTACGGGC CAGATATACG CGTTGACATT GATTATTGAC 
GACGAAGCGC TACATGCCCG GTCTATATGC GCAACTGTAA CTAATAACTG 

251 TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT AGCCCATATA 
ATCAATAATT ATCATTAGTT AATGCCCCAG TAATCAAGTA TCGGGTATAT 

301 TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG 
ACCTCAAGGC GCAATGTATT GAATGCCATT TACCGGGCGG ACCGACTGGC 

351 CCCAACGACC CCCGCCCATT GACGTCAATA ATGACGTATG TTCCCATAGT 
GGGTTGCTGG GGGCGGGTAA CTGCAGTTAT TACTGCATAC AAGGGTATCA 

401 AACGCCAATA GGGACTTTCC ATTGACGTCA ATGGGTGGAC TATTTACGGT 
TTGCGGTTAT CCCTGAAAGG TAACTGCAGT TACCCACCTG ATAAATGCCA 

4 51 AAACTGCCCA CTTGGCAGTA CATCAAGTGT ATCATATGCC AAGTACGCCC 
TTTGACGGGT GAACCGTCAT GTAGTTCACA TAGTATACGG TTCATGCGGG 

501 CCTATTGACG TCAATGACGG TAAATGGCCC GCCTGGCATT ATGCCCAGTA 
GGATAACTGC AGTTACTGCC ATTTAC CGGG CGGACCGTAA TACGGGTCAT 

551 CATGACCTTA TGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA 
GTACTGGAAT ACCCTGAAAG GATGAACCGT CATGTAGATG CATAATCAGT 

601 TCGCTATTAC CATGGTGATG CGGTTTTGGC AGTACATCAA TGGGCGTGGA 
AGCGATAATG GTACCACTAC GCCAAAACCG TCATGTAGTT ACCCGCACCT 

651 TAGCGGTTTG ACTCACGGGG ATTTCCAAGT CTCCACCCCA TTGACGTCAA 
ATCGCCAAAC TGAGTGCCCC TAAAGGTTCA GAGGTGGGGT AACTGCAGTT 

7 01 TGGGAGTTTG TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA 
ACCCTCAAAC AAAACCGTGG TTTTAGTTGC CCTGAAAGGT TTTACA GCAT 

7 51 ACAACTCCGC CCCATTGACG CAAATGGGCG GTAGGCGTGT ACGGTGGGAG 
TGTTGAGGCG GGGTAACTGC GTTTACCCGC CATCCGCACA TGCCA CCCTC 

801 GTCTATATAA GCAGAGCTCT CTGGCTAACT AGAGAACCCA CTGCTTACTG 
CAGATATATT CGTCTCGAGA GACCGATTGA TCTCTTGGGT GACGAATGAC 

851 GCTTATCGAA ATTAATACGA CTCACTATAG GGAGACCCAA GCTGGCTAGC 
CGAATAGCTT TAATTATGCT GAGTGATATC CCTCTGGGTT CGACCGATCG 

901 TTATTGCGGT AGTTTATCAC AGTTAAATTG CTAACGCAGT CAGTGCTTCT 
AATAACGCCA TCAAATAGTG TCAATTTAAC GATTGCGTCA GTCACGAAGA 

951 GACACAACAG TCTCGAACTT AAGCTGCAGT GACTCTCTTA AGGTAGCCTT 

CTGTGTTGTC AGAGCTTGAA TTCGACGTCA CTGAGAGAAT TCCATCGGAA 

1001 GCAGAAGTTG GTCGTGAGGC ACTGGGCAGG TAAGTATCAA GGTTACAAGA 

CGTCTTCAAC CAGCACTCCG TGACCCGTCC ATTCATAGTT CCAATGTTCT 

1051 CAGGTTTAAG GAGACCAATA GAAACTGGGC TTGTCGAGAC AGAGAAGACT 

GTCCAAATTC CTCTGGTTAT CTTTGACCCG AACAGCTCTG TCTCTTCTGA 

1101 CTTGCGTTTC TGATAGGCAC CTATTGGTCT TACTGACATC CACTTTGCCT 

GAACGCAAAG ACTATCCGTG GATAACCAGA ATGACTGTAG GTGAAACGGA 

1151 TTCTCTCCAC AGGTGTCCAC TCCCAGTTCA ATTACAGCTC TTAAAAGCTT 

AAGAGAGGTG TCCACAGGTG AGGGTCAAGT TAATGTCGAG AATTTTCGAA 

•I MK Aip Ty Tp Arg Lp Tjf Ah Al* 
1201 GGTACCGAGC TCGGATCCGC CACCATGGAC TACTACCGCA AGTACGCCGC 
CCATGGCTCG AGCCTAGGCG GTGGTACCTG ATGATGGCGT TCATGCGGCG 
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■AU It Pht Itu 


V*l TV L« 


Sef Vil Ph» Uu Hs Vil L»u 


His Sit AU Pro- 


1251 


CATCTTCCTG 
GTAGAAGGAC 


GTGACCCTGA 
CACTGGGACT 


GCGTGTTCCT 
CGCACAAGGA 


GCACGTGCTG 
CGTGCACGAC 


CACAGCGCCC 
GTGTCGCGGG 


•I 


•Pro Asp VjI Gin Asp Cji Pro 


Glu C|S Thr 


Uu On Gtu A 


cn Pro Phe Ph( 


1301 


CCGACGTGCA 
GGCTGCACGT 


GGACTGCCCC 
CCTGACGGGG 


GAGTGCACCC 
CTCACGTGGG 


TGCAGGAGAA 
ACGTCCTCTT 


CCCCTTCTTC 
GGGGAAGAAG 


•1 


S«f Git Pro 


Gf| AU Pio t, L«i Gin 0« 


M»l Gig Cgs 


Cjs Pht Sft Arg- 


1351 


AGCCAGCCCG 
TCGGTCGGGC 


GCGCCCCCAT 
CGCGGGGGTA 


CCTGCAGTGC 
GGACGTCACG 


ATGGGCTGCT 
TACCCGACGA 


GCTTCAGCCG 
CGAAGTCGGC 


•1 


•Arg Ah Tjr Pro 


Tin Pio Ltu 


Arg S»r Ip Lgs Thr Mm Lw 


W Gin Ljs Asn- 


1401 


CGCCTACCCC 
GCGGATGGGG 


ACCCCCCTGC 
TGGGGGGACG 


GCAGCAAGAA 
CGTCGTTCTT 


GACCATGCTG 
CTGGTACGAC 


GTGCAGAAGA 
CACGTCTTCT 


7T 


■An. Vil Thi S*t GSu S#r TW 


Cjs Cp Vil 


AU Lys Sei Tjr Asn Arg VjI 


1451 


ACGTGACCAG 
TGCACTGGTC 


CGAGAGCACC 
GCTCTCGTGG 


TGCTGCGTGG 
ACGACGCACC 


CCAAGAGCTA 
GGTTCTCGAT 


CAACCGCGTG 
GTTGGCGCAC 


.1 


Thf VjI M« 


Sj Gl% Ph» Us Val Gtu Asn 


His Thr AU i 


Cgs Kij Cjs Stt- 


1501 


ACCGTGATGG 
TGGCACTACC 


GCGGCTTCAA 
CGCCGAAGTT 


GGTGGAGAAC 
CCACCTCTTG 


CACACCGCCT 
GTGTGGCGGA 


GCCACTGCAG 
CGGTGACGTC 


• 1 


•Set Thr Cp Tjr 


r r us i v 








1551 


CACCTGCTAC 
GTGGACGATG 


TACCACAAGA 
ATGGTGTTCT 


GCTAATCTAG 
CGATTAGATC 


AGGGCCCGTT 
TCCCGGGCAA 


TAAACCCGCT 
ATTTGGGCGA 


1601 


GATCAGCCTC 
CTAGTCGGAG 


GACTGTGCCT 
CTGACACGGA 


TCTAGTTGCC 
AGATCAACGG 


AGCCATCTGT 
TCGGTAGACA 


TGTTTGCCCC 
ACAAACGGGG 


1651 


TCCCCCGTGC 
AGGGGGCACG 


CTTCCTTGAC 
GAAGGAACTG 


CCTGGAAGGT 
GGACCTTCCA 


GCCACTCCCA 
CGGTGAGGGT 


CTGTCCTTTC 
GACAGGAAAG 


1701 


CTAATAAAAT 
GATTATTTTA 


GAGGAAATTG 
CTCCTTTAAC 


CATCGCATTG 
GTAGCGTAAC 


TCTGAGTAGG 
AGACTCATCC 


TGTCATTCTA 
ACAGTAAGAT 


1751 


TTCTGGGGGG 
AAGACCCCCC 


TGGGGTGGGG 
ACCCCACCCC 


CAGGACAGCA 
GTCCTGTCGT 


AGGGGGAGGA 
TCCCCCTCCT 


TTGGGAAGAC 
AACCCTTCTG 


1801 


AATAGCAGGC 
TTATCGTCCG 


ATGCTGGGGA 
TACGACCCCT 


TGCGGTGGGC 
ACGCCACCCG 


TCTATGGCTT 
AGATACCGAA 


CTGAGGCGGA 
GACTCCGCCT 


1851 


AAGAACCAGC 
TTCTTGGTCG 


TGGGGCTCTA 
ACCCCGAGAT 


GGGGGTATCC 
CCCCCATAGG 


CCACGCGCCC 
GGTGCGCGGG 


TGTAGCGGCG 
ACATCGCCGC 


1901 


CATTAAGCGC 
GTAATTCGCG 


GGCGGGTGTG 
CCGCCCACAC 


GTGGTTACGC 
CACCAATGCG 


GCAGCGTGAC 
CGTCGCACTG 


CGCTACACTT 
GCGATGTGAA 


1951 


GCCAGCGCCC 
CGGTCGCGGG 


TAGCGCCCGC 
ATCGCGGGCG 


TCCTTTCGCT 
AGGAAAGCGA 


TTCTTCCCTT 
AAGAAGGGAA 


CCTTTCTCGC 
GGAAAGAGCG 


2001 


CACGTTCGCC 
GTGCAAGCGG 


GGCTTTCCCC 
CCGAAAGGGG 


GTCAAGCTCT 
CAGTTCGAGA 


AAATCGGGGC 
TTTAGCCCCG 


ATCCCTTTAG 
TAGGGAAATC 


2051 


GGTTCCGATT 
CCAAGGCTAA 


TAGTGCTTTA 
ATCACGAAAT 


CGGCACCTCG 
GCCGTGGAGC 


ACCCCAAAAA 
TGGGGTTTTT 


ACTTGATTAG 
TGAACTAATC 


2101 


GGTGATGGTT 
CCACTACCAA 


CACGTAGTGG 
GTGCATCACC 


GCCATCGCCC 
CGGTAGCGGG 


TGATAGACGG 
ACTATCTGCC 


TTTTTCGCCC 
AAAAAGCGGG 


2151 


TTTGACGTTG 
AAACTGCAAC 


GAGTCCACGT 
CTCAGGTGCA 


TCTTTAATAG 
AGAAATTATC 


TGGACTCTTG 
ACCTGAGAAC 


TTCCAAACTG 
AAGGTTTGAC 


2201 


GAACAACACT 
CTTGTTGTGA 


CAACCCTATC 
GTTGGGATAG 


TCGGTCTATT 
AGCCAGATAA 


CTTTTGATTT 
GAAAACTAAA 


ATAAGGGATT 
TATTCCCTAA 


2251 


TTGGGGATTT 
AACCCCTAAA 


CGGCCTATTG 
GCCGGATAAC 


GTTAAAAAAT 
CAATTTTTTA 


GAGCTGATTT 
CTCGACTAAA 


AACAAAAATT 
TTGTTTTTAA 


2301 


TAACGCGAAT 
ATTGCGCTTA 


TAATTCTGTG 
ATTAAGACAC 


GAATGTGTGT 
CTTACACACA 


CAGTTAGGGT 
GTCAATCCCA 


GTGGAAAGTC 
CACCTTTCAG 


2351 


CCCAGGCTCC 
GGGTCCGAGG 


CCAGGCAGGC 
GGTCCGTCCG 


AGAAGTATGC 
TCTTCATACG 


AAAGCATGCA 
TTTCGTACGT 


TCTCAATTAG 
AGAGTTAATC 
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24 01 TCAGCAACCA GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG GCAGAAGTAT 
AGTCGTTGGT CCACACCTTT CAGGGGTCCG AGGGGTCGTC CGTCTTCATA 

2451 GCAAAGCATG CATCTCAATT AGTCAGCAAC CATAGTCCCG CCCCTAACTC 
CGTTTCGTAC GTAGAGTTAA TCAGTCGTTG GTATCAGGGC GGGGATTGAG 

2501 CGCCCATCCC GCCCCTAACT CCGCCCAGTT CCGCCCATTC TCCGCCCCAT 
GCGGGTAGGG CGGGGATTGA GGCGGGTCAA GGCGGGTAAG AGGCGGGGTA 

2551 GGCTGACTAA TTTTTTTTAT TTATGCAGAG GCCGAGGCCG CCTCTGCCTC 
CCGACTGATT AAAAAAAATA AATACGTCTC CGGCTCC GGC GGAGACGGAG 

2601 TGAGCTATTC CAGAAGTAGT GAGGAGGCTT TTTTGGAGGC CTAGGCTTTT 
ACTCGATAAG GTCTTCATCA CTCCTCCGAA AAAACCTCCG GATCCGAAAA 

2651 GCAAAAAGCT CCCGGGAGCT TGTATATCCA TTTTCGGATC TGATCAGCAC 
CGTTTTTCGA GGGCCCTCGA ACATATAGGT AAAAGCCTAG ACTAGTCGTG 

2701 GTGATGAAAA AGCCTGAACT CACCGCGACG TCTGTCGAGA AGTTTCTGAT 
CACTACTTTT TCGGACTTGA GTGGCGCTGC AGACAGCTCT TCAAAGACTA 

2751 CGAAAAGTTC GACAGCGTCT CCGACCTGAT GCAGCTCTCG GAGGGCGAAG 
GCTTTTCAAG CTGTCGCAGA GGCTGGACTA CGTCGAGAGC CTCCCGCTTC 

2801 AATCTCGTGC TTTCAGCTTC GATGTAGGAG GGCGTGGATA TGTCCTGCGG 
TTAGAGCACG AAAGTCGAAG CTACATCCTC CCGCACCTAT ACAGGACGCC 

2851 GTAAATAGCT GCGCCGATGG TTTCTACAAA GATCGTTATG TTTATCGGCA 
CATTTATCGA CGCGGCTACC AAAGATGTTT CTAGCAATAC AAATAGCCGT 

2 901 CTTTGCATCG GCCGCGCTCC CGATTCCGGA AGTGCTTGAC ATTGGGGAAT 
GAAACGTAGC CGGCGCGAGG GCTAAG GCCT TCACGAACTG TAACCCCTTA 

2951 TCAGCGAGAG CCTGACCTAT TGCATCTCCC GCCGTGCACA GGGTGTCACG 
AGTCGCTCTC GGACTGGATA ACGTAGAGGG CGGCACGTGT CCCACAGTGC 

3001 TTGCAAGACC TGCCTGAAAC CGAACTGCCC GCTGTTCTGC AGCCGGTCGC 
AACGTTCTGG ACGGACTTTG GCTTGACGGG CGACAAGACG TCGGCCAGCG 

3051 GGAGGCCATG GATGCGATCG CTGCGGCCGA TCTTAGCCAG ACGAGCGGGT 
CCTCCGGTAC CTACGCTAGC GACGCCGGCT AGAATCGGTC TGCTCGCCCA 

3101 TCGGCCCATT CGGACCGCAA GGAATCGGTC AATACACTAC ATGGCGTGAT 
AGCCGGGTAA GCCTGGCGTT CCTTAGCCAG TTATGTGATG TACCGCACTA 

3151 TTCATATGCG CGATTGCTGA TCCCCATGTG TATCACTGGC AAACTGTGAT 
AAGTATACGC GCTAACGACT AGGGGTACAC ATAGTGACCG TT TG AC ACTA 

3201 GGACGACACC GTCAGTGCGT CCGTCGCGCA GGCTCTCGAT GAGCTGATGC 
CCTGCTGTGG CAGTCACGCA GGCAGCGCGT CCGAGAGCTA CTCGACTACG 

3251 TTTGGGCCGA GGACTGCCCC GAAGTCCGGC ACCTCGTGCA CGCGGATTTC 
AAACCCGGCT CCTGACGGGG CTTCAGGCCG TGGAGCACGT GCGCCTAAAG 

3301 GGCTCCAACA ATGTCCTGAC GGACAATGGC CGCATAACAG CGGTCATTGA 
CCGAGGTTGT TACAGGACTG CCTGTTACCG GCGTATTGTC GCCAGTAACT 

3351 CTGGAGCGAG GCGATGTTCG GGGATTCCCA ATACGAGGTC GCCAACATCT 
GACCTCGCTC CGCTACAAGC CCCTAAGGGT TATGCTCCAG CGGTTGTAGA 

3401 TCTTCTGGAG GCCGTGGTTG GCTTGTATGG AGCAGCAGAC GCGCTACTTC 
AGAAGACCTC CGGCACCAAC CGAACATACC TCGTCGTCTG CGCGATGAAG 

34 51 GAGCGGAGGC ATCCGGAGCT TGCAGGATCG CCGCGGCTCC GGGCGTATAT 
CTCGCCTCCG TAGGCCTCGA ACGTCCTAGC GGCGCCGAGG CCCGCATATA 

3501 GCTCCGCATT GGTCTTGACC AACTCTATCA GAGCTTGGTT GACGGCAATT 
CGAGGCGTAA CCAGAACTGG TTGAGATAGT CTCGAACCAA CTGCCGTTAA 

3551 TCGATGATGC AGCTTGGGCG CAGGGTCGAT GCGACGCAAT CGTCCGATCC 
AGCTACTACG TCGAACCCGC GTCCCAGCTA CGCTGCGTTA GCAGGCTAGG 

3601 GGAGCCGGGA CTGTCGGGCG TACACAAATC GCCCGCAGAA GCGCGGCCGT 
CCTCGGCCCT GACAGCCCGC ATGTGTTTAG CGGGCGTCTT CGCGCCGGCA 

3651 CTGGACCGAT GGCTGTGTAG AAGTACTCGC CGATAGTGGA AACCGACGCC 
GACCTGGCTA CCGACACATC TTCATGAGCG GCTATCACCT TTGGCTGCGG 
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3701 


CCAGCACTCG 
GGTCGTGAGC 


TCCGAGGGCA 
AGGCTCCCGT 


AAGGAATAGC 
TTCCTTATCG 


ACGTGCTACG 
TGCACGATGC 


AGATTTCGAT 
TCTAAAGCTA 




3751 


TCCACCGCCG 
AGGTGGCGGC 


CCTTCTATGA 
GGAAGATACT 


AAGGTTGGGC 
TTCCAACCCG 


TTCGGAATCG 
AAGCCTTAGC 


TTTTCCGGGA 
AAAAGGCCCT 





3801 


CGCCGGCTGG 
GCGGCCGACC 


ATGATCCTCC 
TACTAGGAGG 


AGCGCGGGGA 
TCGCGCCCCT 


TCTCATGCTG 
AGAGTACGAC 


GAGTTCTTCG 
CTCAAGAAGC 





3851 


CCCACCCCAA 
GGGTGGGGTT 


CTTGTTTATT 
GAACAAATAA 


GCAGCTTATA 
CGTCGAATAT 


ATGGTTACAA 
TACCAATGTT 


ATAAAGCAAT 
TATTTCGTTA 




3901 


AGCATCACAA 
TCGTAGTGTT 


ATTTCACAAA 
TAAAGTGTTT 


TAAAGCATTT 
ATTTCGTAAA 


TTTTCACTGC 
AAAAGTGACG 


ATTCTAGTTG 
TAAGATCAAC 





3951 


TGGTTTGTCC 
ACCAAACAGG 


AAACTCATCA 
TTTGAGTAGT 


ATGTATCTTA 
TACATAGAAT 


TCATGTCTGT 
AGTACAGACA 


ATACCGTCGA 
TATGGCAGCT 




4001 


CCTCTAGCTA 
GGAGATCGAT 


GAGCTTGGCG 
CTCGAACCGC 


TAATCATGGT 
ATTAGTACCA 


CATAGCTGTT 
GTATCGACAA 


TCCTGTGTGA 
AGGACACACT 




4051 


AATTGTTATC 
TTAACAATAG 


CGCTCACAAT 
GCGAGTGTTA 


TCCACACAAC 
AGGTGTGTTG 


AT AC GAG C C G 
TATGCTCGGC 


GAAGCATAAA 
CTTCGTATTT 




4101 


GTGTAAAGCC 
CACATTTCGG 


TGGGGTGCCT 
ACCCCACGGA 


AATGAGTGAG 
TTACTCACTC 


CTAACTCACA 
GATTGAGTGT 


TTAATTGCGT 
AATTAACGCA 




4151 


TGCGCTCACT 
ACGCGAGTGA 


GCCCGCTTTC 
CGGGCGAAAG 


CAGTCGGGAA 
GTCAGCCCTT 


ACCTGTCGTG 
TGGACAGCAC 


CCAGCTGCAT 
GGTCGACGTA 




4201 


TAATGAATCG 
ATTACTTAGC 


GCCAACGCGC 
CGGTTGCGCG 


GGGGAGAGGC 
CCCCTCTCCG 


GGTTTGCGTA 
CCAAACGCAT 


TTGGGCGCTC 
AACCCGCGAG 




4251 


TTCCGCTTCC 
AAGGCGAAGG 


TCGCTCACTG 
AGCGAGTGAC 


ACTCGCTGCG 
TGAGCGACGC 


CTCGGTCGTT 
GAGCCAGCAA 


CGGCTGCGGC 
GCCGACGCCG 





4301 


GAGCGGTATC 
CTCGCCATAG 


AGCTCACTCA 
TCGAGTGAGT 


AAGGCGGTAA 
TTCCGCCATT 


TACGGTTATC 
ATGCCAATAG 


CACAGAATCA 
GTGTCTTAGT 




4351 


GGGGATAACG 
CCCCTATTGC 


CAGGAAAGAA 
GTCCTTTCTT 


CATGTGAGCA 
GTACACTCGT 


AAAGGCCAGC 
TTTCCGGTCG 


AAAAGGCCAG 
TTTTCCGGTC 




4401 


GAACCGTAAA AAGGCCGCGT 
CTTGGCATTT TTCCGGCGCA 


TGCTGGCGTT 
ACGACCGCAA 


TTTCCATAGG 
AAAGGTATCC 


CTCCGCCCCC 
GAGGCGGGGG 




4451 


CTGACGAGCA 
GACTGCTCGT 


TCACAAAAAT 
AGTGTTTTTA 


CGACGCTCAA 
GCTGCGAGTT 


GTCAGAGGTG 
CAGTCTCCAC 


GCGAAACCCG 
CGCTTTGGGC 




4501 


AC AG G AC TAT 
TGTCCTGATA 


AAAGATACCA 
TTTCTATGGT 


GGCGTTTCCC 
CCGCAAAGGG 


CCTGGAAGCT 
GGACCTTCGA 


CCCTCGTGCG 
GGGAGCACGC 




4551 


CTCTCCTGTT 
GAGAGGACAA 


CCGACCCTGC 
GGCTGGGACG 


CGCTTACCGG 
GCGAATGGCC 


ATACCTGTCC 
TATGGACAGG 


GCCTTTCTCC 
CGGAAAGAGG 




4601 


CTTCGGGAAG 
GAAGCCCTTC 


CGTGGCGCTT 
GCACCGCGAA 


TCTCAATGCT 
AGAGTTACGA 


CACGCTGTAG 
GTGCGACATC 


GTATCTCAGT 
CATAGAGTCA 




4651 


TCGGTGTAGG 
AGCCACATCC 


TCGTTCGCTC 
AGCAAGCGAG 


CAAGCTGGGC 
GTTCGACCCG 


TGTGTGCACG 
ACACACGTGC 


AACCCCCCGT 
TTGGGGG GCA 




4701 


TCAGCCCGAC 
AGTCGGGCTG 


CGCTGCGCCT 
GCGACGCGGA 


TATCCGGTAA 
ATAGGCCATT 


CTATCGTCTT 
GATAGCAGAA 


GAGTCCAACC 
CTCAGGTTGG 




4751 


CGGTAAGACA 
GCCATTCTGT 


CGACTTATCG 
GCTGAATAGC 


CCACTGGCAG 
GGTGACCGTC 


CAGCCACTGG 
GTCGGTGACC 


TAACAGGATT 
ATTGTCCTAA 




4801 


AGCAGAGCGA 
TCGTCTCGCT 


GGTATGTAGG 
CCATACATCC 


CGGTGCTACA 
GCCACGATGT 


GAGTTCTTGA 
CTCAAGAACT 


AGTGGTGGCC 
TCACCACCGG 




4851 


TAACTACGGC 
ATTGATGCCG 


TACACTAGAA 
ATGTGATCTT 


GGACAGTATT 
CCTGTCATAA 


TGGTATCTGC 
ACCATAGACG 


GCTCTGCTGA 
CGAGACGACT 




4901 


AGCCAGTTAC 
TCGGTCAATG 


CTTCGGAAAA 
GAAGCCTTTT 


AGAGTTGGTA 
TCTCAACCAT 


GCTCTTGATC 
CGAGAACTAG 


CGGCAAACAA 
GCCGTTTGTT 




4951 


ACCACCGCTG 
TGGTGGCGAC 


GTAGCGGTGG 
CATCGCCACC 


TTTTTTTGTT 
AAAAAAACAA 


TGCAAGCAGC 
ACGTTCGTCG 


AGATTACGCG 
TCTAATGCGC 
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5001 CAGAAAAAAA GGATCTCAAG AAGATCCTTT GATCTTTTCT ACGGGGTCTG 
GTCTTTTTTT CCTAGAGTTC TTCTAGGAAA CTAGAAAAGA TGCCCCAGAC 

5051 ACGCTCAGTG GAACGAAAAC TCACGTTAAG GGATTTTGGT CATGAGATTA 
TGCGAGTCAC CTTGCTTTTG AGTGCAATTC CCTAAAACCA GTACTCTAAT 

5101 TCAAAAAGGA TCTTCACCTA GATCCTTTTA AATTAAAAAT GAAGTTTTAA 
AGTTTTTCCT AGAAGTGGAT CTAGGAAAAT TTAATTTTTA CTTCAAAATT 

5151 ATCAATCTAA AGTATATATG AGTAAACTTG GTCTGACAGT TACCAATGCT 
TAGTTAGATT TCATATATAC TCATTTGAAC CAGACTGTCA ATGGTTACGA 

5201 TAATCAGTGA GGCACCTATC TCAGCGATCT GTCTATTTCG TTCATCCATA 
ATTAGTCACT CCGTGGATAG AGTCGCTAGA CAGATAAAGC AAGT AG GT AT 

5251 GTTGCCTGAC TCCCCGTCGT GTAGATAACT ACGATACGGG AGGGCTTACC 
CAACGGACTG AGGGGCAGCA CATCTATTGA TGCTAT GCCC TCCCGAATGG 

5301 ATCTGGCCCC AGTGCTGCAA TGATACCGCG AGACCCACGC TCACCGGCTC 
TAGACCGGGG TCACGACGTT ACTATGGCGC TCTGGGTGCG AGTGGCCGAG 

5351 CAGATTTATC AGCAATAAAC CAGCCAGCCG GAAGGGCCGA GCGCAGAAGT 
GTCTAAATAG TCGTTATTTG GTCGGTCGGC CTTCCCGGCT CGCGTCTTCA 

5401 GGTCCTGCAA CTTTATCCGC CTCCATCCAG TCTATTAATT GTTGCCGGGA 
CCAGGACGTT GAAATAGGCG GAGGTAGGTC AGATAATTAA CAACGGCCCT 

5451 AGCTAGAGTA AGTAGTTCGC CAGTTAATAG TTTGCGCAAC GTTGTTGCCA 
TCGATCTCAT TCATCAAGCG GTCAATTATC AAACGCGTTG CAACAA CGGT 

5501 TTGCTACAGG CATCGTGGTG TCACGCTCGT CGTTTGGTAT GGCTTCATTC 
AACGATGTCC GTAGCACCAC AGTGCGAGCA GC AAAC CAT A CCGAAGTAAG 

5551 AGCTCCGGTT CCCAACGATC AAGGCGAGTT ACATGATCCC CCATGTTGTG 
TCGAGGCCAA GGGTTGCTAG TTCCGCTCAA TGTACTAGGG GGTACAACAC 

5601 CAAAAAAGCG GTTAGCTCCT TCGGTCCTCC GATCGTTGTC AG AAGTAAGT 
GTTTTTTCGC CAATCGAGGA AGCCAGGAGG CTAGCAACAG TCTTCATTCA 

5651 TGGCCGCAGT GTTATCACTC ATGGTTATGG CAGCACTGCA TAATTCTCTT 
ACCGGCGTCA CAATAGTGAG TACCAATACC GTCGTGACGT ATTAAGAGAA 

5701 ACTGTCATGC CATCCGTAAG ATGCTTTTCT GTGACTGGTG AGTACTCAAC 
TGACAGTACG GTAGGCATTC TACGAAAAGA CACTGACCAC TCATGAGTTG 

5751 CAAGTCATTC TGAGAATAGT GTATGCGGCG ACCGAGTTGC TCTTGCCCGG 
GTTCAGTAAG ACTCTTATCA CATACGCCGC TGGCTCAACG AGAACGGGCC 

5801 CGTCAATACG GGATAATACC GCGCCACATA GCAGAACTTT AAAAGTGCTC 
GCAGTTATGC CCTATTATGG CGCGGTGTAT CGTCTTGAAA T TTTCACGAG 

5851 ATCATTGGAA AACGTTCTTC GGGGCGAAAA CTCTCAAGGA TCTTACCGCT 
TAGTAACCTT TTGCAAGAAG CCCCGCTTTT GAGAGTTCCT AGAATGGCGA 

5901 GTTGAGATCC AGTTCGATGT AACCCACTCG TGCACCCAAC TGATCTTCAG 
CAACTCTAGG TCAAGCTACA TTGGGTGAGC ACGTGGGTTG ACTAGAAGTC 

5951 CATCTTTTAC TTTCACCAGC GTTTCTGGGT GAGCAAAAAC AGGAAGGCAA 
GTAGAAAATG AAAGTGGTCG CAAAGACCCA CTCGTTTTTG TCCTTCCGTT 

6001 AATGCCGCAA AAAAGGGAAT AAGGGCGACA CGGAAATGTT GAATACTCAT 
TTACGGCGTT TTTTCCCTTA TTCCCGCTGT GCCTTTACAA CTTATGAGTA 

6051 ACTCTTCCTT TTTCAATATT ATTGAAGCAT TTATCAGGGT TATTGTCTCA 
TGAGAAGGAA AAAGTTATAA TAACTTCGTA AATAGTCCCA AT AAC AG AG T 

6101 TGAGCGGATA CATATTTGAA TGTATTTAGA AAAATAAACA AATAGGGGTT 
ACTCGCCTAT GTATAAACTT ACATAAATCT TTTTATTTGT TTATCCCCAA 

6151 CCGCGCACAT TTCCCCGAAA AGTGCCACCT GACGTC 
GGCGCGTGTA AAGGGGCTTT TCACGGTGGA CTGCAG 
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1 


GACGGATCGG 
CTGCCTAGCC 


GAGATCTCCC 
CTCTAGAGGG 


GATCCCCTAT 
CTAGGGGATA 


GGTCGACTCT 
CCAGCTGAGA 


CAGTACAATC 
. GTCATGTTAG 




51 


TGCTCTGATG 
ACGAGACTAC 


CCGCATAGTT 
GGCGTATCAA 


AAGCCAGTAT 
TTCGGTCATA 


CTGCTCCCTG 
GACGAGGGAC 


CTTGTGTGTT 
GAACACACAA 




101 


GGAGGTCGCT 
CCTCCAGCGA 


GAGTAGTGCG 
CTCATCACGC 


CGAGCAAAAT 
GCTCGTTTTA 


TTAAGCTACA 
AATTCGATGT 


ACAAGGCAAG 
TGTTCCGTTC 




151 


GCTTGACCGA 
CGAACTGGCT 


CAATTGCATG 
GTTAACGTAC 


AAGAATCTGC 
TTCTTAGACG 


TTAGGGTTAG 
AATCCCAATC 


GCGTTTTGCG 
CGCAAAACGC 




201 


CTGCTTCGCG 
GACGAAGCGC 


ATGTACGGGC 
TACATGCCCG 


CAGATATACG 
GTCTATATGC 


CGTTGACATT 
GCAACTGTAA 


GATTATTGAC 
CTAATAACTG 




251 


TAGTTATTAA 
ATCAATAATT 


TAGTAATCAA 
ATCATTAGTT 


TTACGGGGTC 
AATGCCCCAG 


ATTAGTTCAT 
TAATCAAGTA 


AGCCCATATA 
TCGGGTATAT 




301 


TGGAGTTCCG 
ACCTCAAGGC 


CGTTACATAA 
GCAATGTATT 


CTTACGGTAA 
GAATGCCATT 


ATGGCCCGCC 
TACCGGGCGG 


TGGCTGACCG 
ACCGACTGGC 





351 


CCCAACGACC 
GGGTTGCTGG 


CCCGCCCATT 
GGGCGGGTAA 


GACGTCAATA 
CTGCAGTTAT 


ATGACGTATG 
TACTGCATAC 


TTCCCATAGT 
AAGGGTATCA 




401 


AACGCCAATA 
TTGCGGTTAT 


GGGACTTTCC 
CCCTGAAAGG 


ATTGACGTCA 
TAACTGCAGT 


ATGGGTGGAC 
TACCCACCTG 


TATTTACGGT 
ATAAATGCCA 




451 


AAACTGCCCA 
TTTGACGGGT 


CTTGGCAGTA 
GAACCGTCAT 


CATCAAGTGT 
GTAGTTCACA 


ATCATATGCC 
TAGTATACGG 


AAGTACGCCC 
TTCATGCGGG 





501 


CCTATTGACG 
GGATAACTGC 


TCAATGACGG 
AGTTACTGCC 


TAAATGGCCC 
ATTTACCGGG 


GCCTGGCATT 
CGGACCGTAA 


ATGCCCAGTA 
TACGGGTCAT 




551 


CATGACCTTA 
GTACTGGAAT 


TGGGACTTTC 
ACCCTGAAAG 


CTACTTGGCA 
GATGAACCGT 


GTACATCTAC 
CATGTAGATG 


GTATTAGTCA 
CATAATCAGT 




601 


TCGCTATTAC 
AGCGATAATG 


CATGGTGATG 
GTACCACTAC 


CGGTTTTGGC 
GCCAAAACCG 


AGTACATCAA 
TCATGTAGTT 


TGGGCGTGGA 
ACCCGCACCT 




651 


TAGCGGTTTG 
ATCGCCAAAC 


ACTCACGGGG 
TGAGTGCCCC 


ATTTCCAAGT 
TAAAGGTTCA 


CTCCACCCCA 
GAGGTGGGGT 


TTGACGTCAA 
AACTGCAGTT 





701 


TGGGAGTTTG 
ACCCTCAAAC 


TTTTGGCACC 
AAAACCGTGG 


AAAATCAACG 
TTTTAGTTGC 


GGACTTTCCA 
CCTGAAAGGT 


AAATGTCGTA 
TTTACAGCAT 




751 


ACAACTCCGC 
TGTTGAGGCG 


CCCATTGACG 
GGGTAACTGC 


CAAATGGGCG 
GTTTACCCGC 


GTAGGCGTGT 
CATCCGCACA 


ACGGTGGGAG 
TGCCACCCTC 




801 


GTCTATATAA 
CAGATATATT 


GCAGAGCTCT 
CGTCTCGAGA 


CTGGCTAACT 
GACCGATTGA 


AGAGAACCCA 
TCTCTTGGGT 


CTGCTTACTG 
GACGAATGAC 




851 


GCTTATCGAA 
CGAATAGCTT 


ATTAATACGA 
TAATTATGCT 


CTCACTATAG 
GAGTGATATC 


GGAGACCCAA 
CCTCTGGGTT 


GCTGGCTAGC 
CGACCGATCG 




901 


TTATTGCGGT 
AATAACGCCA 


AGTTTATCAC 
TCAAATAGTG 


AGTTAAAT'l'G 
TCAATTTAAC 


CTAACGCAGT 
GATTGCGTCA 


CAGTGCTTCT 
GTCACGAAGA 




951 


GACACAACAG 
CTGTGTTGTC 


TCTCGAACTT 
AGAGCTTGAA 


AAGCTGCAGT 
TTCGACGTCA 


GACTCTCTTA 
CTGAGAGAAT 


AGGTAGCCTT 
TCCATCGGAA 





1001 


GCAGAAGTTG 
CGTCTTCAAC 


GTCGTGAGGC 
CAGCACTCCG 


ACTGGGCAGG 
TGACCCGTCC 


TAAGTATCAA 
ATTCATAGTT 


GGTTACAAGA 
CCAATGTTCT 




1051 


CAGGTTTAAG 
GTCCAAATTC 


GAGACCAATA 
CTCTGGTTAT 


GAAACTGGGC 
CTTTGACCCG 


TTGTCGAGAC 
AACAGCTCTG 


AGAGAAGACT 
TCTCTTCTGA 




1101 


CTTGCGTTTC 
GAACGCAAAG 


TGATAGGCAC 
ACTATCCGTG 


CTATTGGTCT 
GATAACCAGA 


TACTGACATC 
ATGACTGTAG 


CACTTTGCCT 
GTGAAACGGA 




1151 


TTCTCTCCAC 
AAGAGAGGTG 


AGGTGTCCAC 
TCCACAGGTG 


TCCCAGTTCA 
AGGGTCAAGT 


ATTACAGCTC 
TAATGTCGAG 


TTAAAAGCTT 
AATTTTCGAA 




•1 








H»t Qu TJ» Li 


tu On Pti» Ph*- 




1201 


GGTACCGAGC 
CCATGGCTCG 


TCGGATCTAT 
AGCCTAGATA 


CGATGCCACC 
GCTACGGTGG 


ATGGAGACCC 
TACCTCTGGG 


TGCAGTTCTT 
ACGTCAAGAA 
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tm Cgs Cjs Trp Lp Alt It Cgs Cp Act Sw Cp GU Lw TV 



1251 


CTTCCTGTTC 


TGCTGCTGGA 


AGGCCATCTG 


CTGCAACAGC 


TGCGAGCTGA 






GAAGGACAAG 


ACGACGACCT 


TCCGGTAGAC 


GACGTTGTCG 


ACGCTCGACT 




r 


Tht Asn U Thr It Alt It 


Qtu lp Gfci 


Glu Cjs Arg Pht Cp It Sti 




1301 


CCAACATCAC 


CATCGCCATC 


GAGAAGGAGG 


AGTGCCGCTT 


CTGCATCAGC 




— 


GGTTGTAGTG 


GTAGCGGTAG 


CTCTTCCTCC 


TCACGGCGAA 


GACGTAGTCG 






U Am Thr 


Thr Trp Cj« A 


It Gig Tgi Cp 


Tjr Thr A19 


Asp Ltu Vtl Tjr 




1351 


ATCAACACCA 


CCTGGTGCGC 


CGGCTACTGC 


TACACCCGCG 


ACCTGGTGTA 




- 


TAGTTGTGGT 


GGACCACGCG 


GCCGATGACG 


ATGTGGGCGC 


TGGACCACAT 






■Tj Lp Asp Pro 


Alt A15 Pro 


lp It On l|s Thi Cjs Thr 


Pht lp Oj letr 




1401 


CAAGGACCCC 


GCCCGCCCCA 


AGATCCAGAA 


GACCTGCACC 


TTCAAGGAGC 






GTTCCTGGGG 


CGGGCGGGGT 


TCTAGGTCTT 


CTGGACGTGG 


AAGTTCCTCG 




*' 


itu Vtl Tp Gkj Thi Vtl Arj 


Vtl Pro Qj 


Cjs Alt Hi His Alt Asp Str 




1451 


TGGTGTACGA 


GACGGTCCGG 


GTGCCCGGCT 


GCGCCCACCA 


CGCCGACAGC 






ACCACATGCT 


CTGCCAGGCC 


CACGGGCCGA 


CGCGGGTGGT 


GCGGCTGTCG 




*' 


Ltu Tjt Thr 


Tjr Pro Vt) All Thr Gin Cp 


His Cp Gtj 


Ljs Cgs Asp Sti- 




1501 


CTGTACACCT 


ACCCCGTGGC 


CACCCAGTGC 


CACTGCGGCA 


AGTGCGACAG 






GACATGTGGA 


TGGGGCACCG 


GTGGGTCACG 


GTGACGCCGT 


TCACGCTGTC 




*' 


-Str Asp Str Thr 


Asp Cjs Thr 


Vtl Arg Gig Ltu Gig Pro Sti 


Tjr Cji Str Pht- 




1551 


CGACAGCACC 


GACTGCACCG 


TGCGCGGCCT 


GGGCCCCAGC 


TACTGCAGCT 




_ 


GCTGTCGTGG 


CTGACGTGGC 


ACGCGCCGGA 


CCCGGGGTCG 


ATGACGTCGA 






Pht Qj GDu Mi 


* Lgs 










1601 


TCGGCGAGAT 


GAAGGAGTAA 


CTCGAGACTA 


GAGGGCCCGT 


TTAAACCCGC 






AGCCGCTCTA 


CTTCCTCATT 


GAGCTCTGAT 


CTCCCGGGCA 


AATTTGGGCG 




1651 


TGATCAGCCT 


CGACTGTGCC 


TTCTAGTTGC 


CAGCCATCTG 


TTGTTTGCCC 






ACTAGTCGGA 


GCTGACACGG 


AAGATCAACG 


GTCGGTAGAC 


AACAAACGGG 




1701 


CTCCCCCGTG 


CCTTCCTTGA 


CCCTGGAAGG 


TGCCACTCCC 


ACTGTCCTTT 






GAGGGGGCAC 


GGAAGGAACT 


GGGACCTTCC 


ACGGTGAGGG 


TGACAGGAAA 




1751 


CCTAATAAAA 


TGAGGAAATT 


GCATCGCATT 


GTCTGAGTAG 


GTGTCATTCT 






GGATTATTTT 


ACTCCTTTAA 


CGTAGCGTAA 


CAGACTCATC 


CACAGTAAGA 




1801 


ATTCTGGGGG 


GTGGGGTGGG 


GCAGGACAGC 


AAGGGGGAGG 


ATTGGGAAGA 






TAAGACCCCC 


CACCCCACCC 


CGTCCTGTCG 


TTCCCCCTCC 


TAACCCTTCT 




1851 


CAATAGCAGG 


CATGCTGGGG 


ATGCGGTGGG 


CTCTATGGCT 


TCTGAGGCGG 






GTTATCGTCC 


GTACGACCCC 


TACGCCACCC 


GAGATACCGA 


AGACTCCGCC 




1901 


AAAGAACCAG 


CTGGGGCTCT 


AGGGGGTATC 


CCCACGCGCC 


CTGTAGCGGC 






TTTCTTGGTC 


GACCCCGAGA 


TCCCCCATAG 


GGGTGCGCGG 


GACATCGCCG 




1951 


GCATTAAGCG 


CGGCGGGTGT 


GGTGGTTACG 


CGCAGCGTGA 


CCGCTACACT 






CGTAATTCGC 


GCCGCCCACA 


CCACCAATGC 


GCGTCGCACT 


GGCGATGTGA 




2001 


TGCCAGCGCC 


CTAGCGCCCG 


CTCCTTTCGC 


TTTCTTCCCT 


TCCTTTCTCG 






ACGGTCGCGG 


GATCGCGGGC 


GAGGAAAGCG 


AAAGAAGGGA 


AGGAAAGAGC 






CCACGTTCGC 


CGGCTTTCCC 


CGTCAAGCTC 


TAAATCGGGG 


CATCCCTTTA 






GGTGCAAGCG 


GCCGAAAGGG 


GCAGTTCGAG 


ATTTAGCCCC 


GTAGGGAAAT 




2101 


GGGTTCCGAT 


TTAGTGCTTT 


ACGGCACCTC 


GACCCCAAAA 


AACTTGATTA 






CCCAAGGCTA 


AATC AC G AAA 


TGCCGTGGAG 


CTGGGGTTTT 


TTGAACTAAT 




2151 


GGGTGATGGT 


TCACGTAGTG 


GGCCATCGCC 


CTGATAGACG 


GTTTTTCGCC 






CCCACTACCA 


AGTGCATCAC 


CCGGTAGCGG 


GACTATCTGC 


CAAAAAGCGG 





2201 CTTTGACGTT GGAGTCCACG TTCTTTAATA GTGGACTCTT GTTCCAAACT 
GAAACTGCAA CCTCAGGTGC AAGAAATTAT CACCTGAGAA CAAGGTTTGA 



2251 GGAACAACAC TCAACCCTAT CTCGGTCTAT TCTTTTGATT TATAAGGGAT 
CCTTGTTGTG AGTTGGGATA GAGCCAGATA AGAAAACTAA ATATTCCCTA 

2301 TTTGGGGATT TCGGCCTATT GGTTAAAAAA TGAGCTGATT TAACAAAAAT 
AAACCCCTAA AGCCGGATAA CCAATTTTTT ACTCGACTAA ATTGTTTTTA 
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2351 TTAACGCGAA TTAATTCTGT GGAATGTGTG TCAGTTAGGG TGTGGAAAGT 
AATTGCGCTT AATTAAGACA CCTTACACAC AGTCAATCCC ACACCTTTCA 

2401 CCCCAGGCTC CCCAGGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA 
GGGGTCCGAG GGGTCCGTCC GTCTTCATAC GTTTCGTACG TAGAGTTAAT 

24 51 GTCAGCAACC AGGTGTGGAA AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA 
CAGTCGTTGG TCCACACCTT TCAGGGGTCC GAGGGGTCGT CCGTCTTCAT 

2501 TGCAAAGCAT GCATCTCAAT TAGTCAGCAA CCATAGTCCC GCCCCTAACT 
ACGTTTCGTA CGTAGAGTTA ATCAGTCGTT GGTATCAGGG CGGGGATTGA 

2551 CCGCCCATCC CGCCCCTAAC TCCGCCCAGT TCCGCCCATT CTCCGCCCCA 
GGCGGGTAGG GCGGGGATTG AGGCGGGTCA AGGCGCGTAA GAGGCGGGGT 

2601 TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC GCCTCTGCCT 
ACCGACTGAT TAAAAAAAAT AAATACGTCT CCGGCTCCGG CGGAGACGGA 

2651 CTGAGCTATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG CCTAGGCTTT 
GACTCGATAA GGTCTTCATC ACTCCTCCGA AAAAACCTCC GGATCCGAAA 

2701 TGCAAAAAGC TCCCGGGAGC TTGTATATCC ATTTTCGGAT CTGATCAGCA 
ACGTTTTTCG AGGGCCCTCG AACATATAGG TAAAAGCCTA GACTAGTCGT 

27 51 CGTGTTGACA ATTAATCATC GGCATAGTAT ATCGGCATAG TATAATACGA 
GCACAACTGT TAATTAGTAG CCGTATCATA TAGCCGTATC ATATTATGCT 

2801 CAAGGTGAGG AACTAAACCA TGGCCAAGTT GACCAGTGCC GTTCCGGTGC 
GTTCCACTCC TTGATTTGGT ACCGGTTCAA CTGGTCACGG CAAGGCCACG 

2851 TCACCGCGCG CGACGTCGCC GGAGCGGTCG AGTTCTGGAC CGACCGGCTC 
AGTGGCGCGC GCTGCAGCGG CCTCGCCAGC TCAAGACCTG GCTGGCCGAG 

2901 GGGTTCTCCC GGGACTTCGT GGAGGACGAC TTCGCCGGTG TGGTCCGGGA 
CCCAAGAGGG CCCTGAAGCA CCTCCTGCTG AAGCGGCCAC ACCAGGCCCT 

2951 CGACGTGACC CTGTTCATCA GCGCGGTCCA GGACCAGGTG GTGCCGGACA 
GCTGCACTGG GACAAGTAGT CGCGCCAGGT CCTGGTCCAC CACGGCCTGT 

3001 ACACCCTGGC CTGGGTGTGG GTGCGCGGCC TGGACGAGCT GTACGCCGAG 
TGTGGGACCG GACCCACACC CACGCGCCGG ACCTGCTCGA CATGCGGCTC 

3051 TGGTCGGAGG TCGTGTCCAC GAACTTCCGG GACGCCTCCG GGCCGGCCAT 
ACCAGCCTCC AGCACAGGTG CTTGAAGGCC CTGCGGAGGC CCGGCCGGTA 

3101 GACCGAGATC GGCGAGCAGC CGTGGGGGCG GGAGTTCGCC CTGCGCGACC 
CTGGCTCTAG CCGCTCGTCG GCACCCCCGC CCTCAAGCGG GACGCGCTGG 

3151 CGGCCGGCAA CTGCGTGCAC TTCGTGGCCG AGGAGCAGGA CTGACACGTG 
GCCGGCCGTT GACGCACGTG AAGCACCGGC TCCTCGTCCT GACTGTGCAC 

3201 CTACGAGATT TCGATTCCAC CGCCGCCTTC TATGAAAGGT TGGGCTTCGG 
GATGCTCTAA AGCTAAGGTG GCGGCGGAAG ATACTTTCCA ACCCGAAGCC 

3251 AATCGTTTTC CGGGACGCCG GCTGGATGAT CCTCCAGCGC GGGGATCTCA 
TTAGCAAAAG GCCCTGCGGC CGACCTACTA GGAGGTCGCG CCCCTAGAGT 

3301 TGCTGGAGTT CTTCGCCCAC CCCAACTTGT TTATTGCAGC TTATAATGGT 
ACGACCTCAA GAAGCGGGTG GGGTTGAACA AATAACGTCG AATATTACCA 

3351 TACAAATAAA GCAATAGCAT CACAAATTTC ACAAATAAAG CATTTTTTTC 
ATGTTTATTT CGTTATCGTA GTGTTTAAAG TGTTTATTTC GTAAAAAAAG 

34 01 ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA TCTTATCATG 
TGACGTAAGA TCAACACCAA ACAGGTTTGA GTAGTTACAT AGAATAGTAC 

34 51 TCTGTATACC GTCGACCTCT AGCTAGAGCT TGGCGTAATC ATGGTCATAG 
AGACATATGG CAGCTGGAGA TCGATCTCGA ACCGCATTAG TACCAGTATC 

3501 CTGTTTCCTG TGTGAAATTG TTATCCGCTC ACAATTCCAC ACAACATACG 
GACAAAGGAC ACACTTTAAC AATAGGCGAG TGTTAAGGTG TGTTGTATGC 

3551 AGCCGGAAGC ATAAAGTGTA AAGCCTGGGG TGCCTAATGA GTGAGCTAAC 
TCGGCCTTCG TATTTCACAT TTCGGACCCC ACGGATTACT CACTCGATTG 

3601 TCACATTAAT TGCGTTGCGC TCACTGCCCG CTTTCCAGTC GGGAAACCTG 
AGTGTAATTA ACGCAACGCG AGTGACGGGC GAAAGGTCAG CCCTTTGGAC 
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3651 


TCGTGCCAGC 
AGCACGGTCG 


TGCATTAATG 
ACGTAATTAC 


AATCGGCCAA 
TTAGCCGGTT 


, CGCGCGGGGA GAGGCGGTTT 
1 GCGCGCCCCT CTCCGCCAAA 




3701 


GCGTATTGGG 
CGCATAACCC 


CGCTCTTCCG 
GCGAGAAGGC 


CTTCCTCGCT 
GAAGGAGCGA 


CACTGACTCG CTGCGCTCGG 
GTGACTGAGC GACGCGAGCC 




3751 


TCGTTCGGCT 
AGCAAGCCGA 


GCGGCGAGCG 
CGCCGCTCGC 


GTATCAGCTC 
CATAGTCGAG 


ACTCAAAGGC GGTAATACGG 
TGAGTTTCCG CCATTATGCC 





3801 


TTATCCACAG 
AATAGGTGTC 


AATCAGGGGA 
TTAGTCCCCT 


TAACGCAGGA 
ATTGCGTCCT 


AAGAACATGT GAGCAAAAGG 
TTCTTGTACA CTCGTTTTCC 





3851 


CCAGCAAAAG 
GGTCGTTTTC 


GCCAGGAACC 
CGGTCCTTGG 


GTAAAAAGGC 
CATTTTTCCG 


CGCGTTGCTG GCGTTTTTCC 
GCGCAACGAC CGCAAAAAGG 




3901 


ATAGGCTCCG 
TATCCGAGGC 


CCCCCCTGAC 
GGGGGGACTG 


GAGCATCACA 
CTCGTAGTGT 


AAAATCGACG CTCAAGTCAG 
TTTTAGCTGC GAGTTCAGTC 





3951 


AGGTGGCGAA 
TCCACCGCTT 


ACCCGACAGG 
TGGGCTGTCC 


ACTATAAAGA 
TGATATTTCT 


TACCAGGCGT TTCCCCCTGG 
ATGGTCCGCA AAGGGGGACC 




4001 


AAGCTCCCTC 
TTCGAGGGAG 


GTGCGCTCTC 
CACGCGAGAG 


CTGTTCCGAC 
GACAAGGCTG 


CCTGCCGCTT ACCGGATACC 
GGACGGCGAA TGGCCTATGG 




4051 


TGTCCGCCTT 
ACAGGCGGAA 


TCTCCCTTCG 
AGAGGGAAGC 


GGAAGCGTGG 
CCTTCGCACC 


CGCTTTCTCA ATGCTCACGC 
GCGAAAGAGT TACGAGTGCG 




4101 


TGTAGGTATC 
ACATCCATAG 


TCAGTTCGGT 
AGTCAAGCCA 


GTAGGTCGTT 
CATCCAGCAA 


CGCTCCAAGC TGGGCTGTGT 
GCGAGGTTCG ACCCGACACA 




4151 


GCACGAACCC 
CGTGCTTGGG 


CCCGTTCAGC 
GGGCAAGTCG 


CCGACCGCTG 
GGCTGGCGAC 


CGCCTTATCC GGTAACTATC 
GCGGAATAGG CCATTGATAG 




4201 


GTCTTGAGTC 
CAGAACTCAG 


CAACCCGGTA 
GTTGGGCCAT 


AGACACGACT 
TCTGTGCTGA 


TATCGCCACT GGCAGCAGCC 
ATAGCGGTGA CCGTCGTCGG 




4251 


ACTGGTAACA 
TGACCATTGT 


GGATTAGCAG 
CCTAATCGTC 


AGCGAGGTAT 
TCGCTCCATA 


GTAGGCGGTG CTACAGAGTT 
CATCCGCCAC GATGTCTCAA 




4301 


CTTGAAGTGG 
GAACTTCACC 


TGGCCTAACT 
ACCGGATTGA 


ACGGCTACAC 
TGCCGATGTG 


TAGAAGGACA GTATTTGGTA 
ATCTTCCTGT CATAAACCAT 




4351 


TCTGCGCTCT 
AGACGCGAGA 


GCTGAAGCCA 
CGACTTCGGT 


GTTACCTTCG 
CAATGGAAGC 


GAAAAAGAGT TGGTAGCTCT 
CTTTTTCTCA ACCATCGAGA 




4401 


TGATCCGGCA 
ACTAGGCCGT 


AACAAACCAC 
TTGTTTGGTG 


CGCTGGTAGC 
GCGACCATCG 


GGTGGTTTTT TTGTTTGCAA 
CCACCAAAAA AACAAACGTT 




4451 


GCAGCAGATT 
CGTCGTCTAA 


ACGCGCAGAA 
TGCGCGTCTT 


AAAAAGGATC 
TTTTTCCTAG 


TCAAGAAGAT CCTTTGATCT 
AGTTCTTCTA GGAAACTAGA 




4501 


TTTCTACGGG 
AAAGATGCCC 


GTCTGACGCT 
CAGACTGCGA 


CAGTGGAACG 
GTCACCTTGC 


AAAACTCACG TTAAGGGATT 
TTTTGAGTGC AATTCCCTAA 





4551 


TTGGTCATGA 
AACCAGTACT 


GATTATCAAA 
CTAATAGTTT 


AAGGATCTTC 
TTCCTAGAAG 


ACCTAGATCC TTTTAAATTA 
TGGATCTAGG AAAATTTAAT 




4601 


AAAATGAAGT 
TTTTACTTCA 


TTTAAATCAA 
AAATT TAG TT 


TCTAAAGTAT ATATGAGTAA ACTTGGTCTG 
AGATTTCATA TATACTCATT TGAACCAGAC 




4651 


ACAGTTACCA 
TGTCAATGGT 


ATGCTTAATC 
TACGAATTAG 


AGTGAGGCAC 
TCACTCCGTG 


CTATCTCAGC GATCTGTCTA 
GATAGAGTCG CTAGACAGAT 




4701 


TTTCGTTCAT 
AAAGCAAGTA 


CCATAGTTGC 
GGTATCAACG 


CTGACTCCCC 
GACTGAGGGG 


GTCGTGTAGA TAACTACGAT 
CAGCACATCT ATTGATGCTA 




4751 


ACGGGAGGGC 
TGCCCTCCCG 


TTACCATCTG 
AATGGTAGAC 


GCCCCAGTGC 
CGGGGTCACG 


TGCAATGATA CCGCGAGACC 
ACGTTACTAT GGCGCTCTGG 





4801 


CACGCTCACC 
GTGCGAGTGG 


GGCTCCAGAT 
CCGAGGTCTA 


TTATCAGCAA 
AATAGTCGTT 


TAAACCAGCC AGCCGGAAGG 
ATTTGGTCGG TCGGCCTTCC 




4851 


GCCGAGCGCA 
CGGCTCGCGT 


GAAGTGGTCC 
CTTCACCAGG 


TGCAACTTTA 
ACGTTGAAAT 


TCCGCCTCCA TCCAGTCTAT 
AGGCGGAGGT AGGTCAGATA 




4901 


TAATTGTTGC 
ATTAACAACG 


CGGGAAGCTA 
GCCCTTCGAT 


GAGTAAGTAG 
CTCATTCATC 


TTCGCCAGTT AATAGTTTGC 
AAGCGGTCAA TTATCAAACG 
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4951 GCAACGTTGT TGCCATTGCT ACAGGCATCG TGGTGTCACG CTCGTCGTTT 
CGTTGCAACA ACGGTAACGA TGTCCGTAGC ACCACAGTGC GAGCAG CAAA 

5001 GGTATGGCTT CATTCAGCTC CGGTTCCCAA CGATCAAGGC GAGTTACATG 
CCATACCGAA GTAAGTCGAG GCCAAGGGTT GCTAGTTCCG CTCAATGTAC 

5051 ATCCCCCATG TTGTGCAAAA AAGCGGTTAG CTCCTTCGGT CCTCCGATCG 
TAGGGGGTAC AACACGTTTT TTCGCCAATC GAGGAAGCCA GGAG GCTAGC 

5101 TTGTCAGAAG TAAGTTGGCC GCAGTGTTAT CACTCATGGT TATGGCAGCA 
AACAGTCTTC ATTCAACCGG CGTCACAATA GTGAGTACCA ATACCGTCGT 

5151 CTGCATAATT CTCTTACTGT CATGCCATCC GTAAGATGCT TTTCTGTGAC 
GACGTATTAA GAGAATGACA GTACGGTAGG CATTCTACGA AAAGACACT G 

5201 TGGTGAGTAC TCAACCAAGT CATTCTGAGA ATAGTGTATG CGGCGACCGA 
ACCACTCATG AGTTGGTTCA GTAAGACTCT TATCACATAC GCCGCTG GCT 

5251 GTTGCTCTTG CCCGGCGTCA ATACGGGATA ATACCGCGCC ACATAGCAGA 
CAACGAGAAC GGGCCGCAGT TATGCCCTAT TATGGCGCGG TGTATCGTCT 

5301 ACTTTAAAAG TGCTCATCAT TGGAAAACGT TCTTCGGGGC GAAAACTCTC 
TGAAATTTTC ACGAGTAGTA ACCTTTTGCA AGAAGCCCCG CTTTTGAG AG 

5351 AAGGATCTTA CCGCTGTTGA GATCCAGTTC GATGTAACCC ACTCGTGCAC 
TTCCTAGAAT GGCGACAACT CTAGGTCAAG CTACATTGGG TGAGCACGTG 

5401 CCAACTGATC TTCAGCATCT TTTACTTTCA CCAGCGTTTC TGGGTGAGCA 
GGTTGACTAG AAGTCGTAGA AAATGAAAG T GGTCGCAAAG ACCCACTCGT 

5451 AAAACAGGAA GGCAAAATGC CGCAAAAAAG GGAATAAGGG CGACACGGAA 
TTTTGTCCTT CCGTTTTACG GCGTTTTTTC CCTTATTCCC GCTGTGC CTT 

5501 ATGTTGAATA CTCATACTCT TCCTTTTTCA ATATTATTGA AGCATTTATC 
TACAACTTAT GAGTATGAGA AGGAAAAAGT TATAATAACT TCGTAAATAG 

5551 AGGGTTATTG TCTCATGAGC GGATACATAT TTGAATGTAT TTAGAAAAAT 
TCCCAATAAC AGAGTACTCG CCTATGTATA AACTTACATA AATCTTTTTA 

5601 AAACAAATAG GGGTTCCGCG CACATTTCCC CGAAAAGTGC CACCTGACGT 
TTTGTTTATC CCCAAGGCGC GTGTAAAGGG GCTTTTCACG GTGGACTGCA 

5651 C 

G 
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1 


GACGGATCGG 
CTGCCTAGCC 


GAGATCTCCC 
CTCTAGAGGG 


GATCCCCTAT 
CTAGGGGATA 


GGTCGACTCT 
CCAGCTGAGA 


CAGTACAATC 
GTCATGTTAG 




51 


TGCTCTGATG 
ACGAGACTAC 


CCGCATAGTT 
GGCGTATCAA 


AAGCCAGTAT 
TTCGGTCATA 


CTGCTCCCTG 
GACGAGGGAC 


CTTGTGTGTT 
GAACACACAA 




101 


GGAGGTCGCT 
CCTCCAGCGA 


GAGTAGTGCG 
CTCATCACGC 


CGAGCAAAAT 
GCTCGTT TTA 


TTAAGCTACA 
AATTCGATGT 


ACAAGGCAAG 
TGTTCCGTTC 




151 


GCTTGACCGA 
CGAACTGGCT 


CAATTGCATG 
GTTAACGTAC 


AAGAATCTGC 
TTCTTAGACG 


TTAGGGTTAG 
AATCCCAATC 


GCGTTTTGCG 
CGCAAAACGC 




201 


CTGCTTCGCG 
GACGAAGCGC 


ATGTACGGGC 
TACATGCCCG 


CAGATATACG 
GTCTATATGC 


CGTTGACATT 
GCAACTGTAA 


GATTATTGAC 
CTAATAACTG 




251 


TAGTTATTAA 
ATCAATAATT 


TAGTAATCAA 
ATCATTAGTT 


TTACGGGGTC 
AATGCCCCAG 


ATTAGTTCAT 
TAATCAAGTA 


AGCCCATATA 
TCGGGTATAT 




301 


TGGAGTTCCG 
ACCTCAAGGC 


CGTTACATAA 
GCAATGTATT 


CTTACGGTAA 
GAATGCCATT 


ATGGCCCGCC 
TACCGGGCGG 


TGGCTGACCG 
ACCGACTGGC 


_ 


351 


CCCAACGACC 
GGGTTGCTGG 


CCCGCCCATT 
GGGCGGGTAA 


GACGTCAATA 
CTGCAGTTAT 


ATGACGTATG 
TACTGCATAC 


TTCCCATAGT 
AAGGGTATCA 




401 


AACGCCAATA 
TTGCGGTTAT 


GGGACTTTCC 
CCCTGAAAGG 


ATTGACGTCA 
TAACTGCAGT 


ATGGGTGGAC 
TACCCACCTG 


TATTTACGGT 
ATAAATGCCA 




451 


AAACTGCCCA 
TTTGACGGGT 


CTTGGCAGTA 
GAACCGTCAT 


CATCAAGTGT 
GTAGTTCACA 


ATCATATGCC 
TAGTATACGG 


AAGTACGCCC 
TTCATGCGGG 




501 


CCTATTGACG 
GGATAACTGC 


TCAATGACGG 
AGTTACTGCC 


TAAATGGCCC 
ATTTACCGGG 


GCCTGGCATT 
CGGACCGTAA 


ATGCCCAGTA 
TACGGGTCAT 




551 


CATGACCTTA 
GTACTGGAAT 


TGGGACTTTC 
ACCCTGAAAG 


CTACTTGGCA 
GATGAACCGT 


GTACATCTAC 
CATGTAGATG 


GTATTAGTCA 
CATAATCAGT 




601 


TCGCTATTAC 
AGCGATAATG 


CATGGTGATG 
GTACCACTAC 


CGGTTTTGGC 
GCCAAAACCG 


AGTACATCAA 
TCATGTAGTT 


TGGGCGTGGA 
ACCCGCACCT 




651 


TAGCGGTTTG 
ATCGCCAAAC 


ACTCACGGGG 
TGAGTGCCCC 


ATTTCCAAGT 
TAAAGGTTCA 


CTCCACCCCA 
GAGGTGGGGT 


TTGACGTCAA 
AACTGCAGTT 




701 


TGGGAGTTTG 
ACCCTCAAAC 


TTTTGGCACC 
AAAACCGTGG 


AAAATCAACG 
TTTTAGTTGC 


GGACTTTCCA 
CCTGAAAGGT 


AAATGTCGTA 
TTTACAGCAT 




751 


ACAACTCCGC 
TGTTGAGGCG 


CCCATTGACG 
GGGTAACTGC 


CAAATGGGCG 
GTTTACCCGC 


GTAGGCGTGT 
CATCCGCACA 


ACGGTGGGAG 
TGCCACCCTC 




B01 


GTCTATATAA 
CAGATATATT 


GCAGAGCTCT 
CGTCTCGAGA 


CTGGCTAACT 
GACCGATTGA 


AGAGAACCCA 
TCTCTTGGGT 


CTGCTTACTG 
GACGAATGAC 




851 


GCTTATCGAA 
CGAATAGCTT 


ATTAATACGA 
TAATTATGCT 


CTCACTATAG 
GAGTGATATC 


GGAGACCCAA GCTGGCTAGC 
CCTCTGGGTT CGACCGATCG 




901 


TTATTGCGGT 
AATAACGCCA 


AGTTTATCAC 
TCAAATAGTG 


AGTTAAATTG 
TCAATTTAAC 


CTAACGCAGT 
GATTGCGTCA 


CAGTGCTTCT 
GTCACGAAGA 




951 


GACACAACAG 
CTGTGTTGTC 


TCTCGAACTT 
AGAGCTTGAA 


AAGCTGCAGT 
TTCGACGTCA 


GACTCTCTTA 
CTGAGAGAAT 


AGGTAGCCTT 
TCCATCGGAA 




1001 


GCAGAAGTTG 
CGTCTTCAAC 


GTCGTGAGGC 
CAGCACTCCG 


ACTGGGCAGG 
TGACCCGTCC 


TAAGTATCAA 
ATTCATAGTT 


GGTTACAAGA 
CCAATGTTCT 




1051 


CAGGTTTAAG 
GTCCAAATTC 


GAGACCAATA 
CTCTGGTTAT 


GAAACTGGGC 
CTTTGACCCG 


TTGTCGAGAC 
AACAGCTCTG 


AGAGAAGACT 
TCTCTTCTGA 




1101 


CTTGCGTTTC 
GAACGCAAAG 


TGATAGGCAC 
ACTATCCGTG 


CTATTGGTCT 
GATAACCAGA 


TACTGACATC 
ATGACTGTAG 


CACTTTGCCT 
GTGAAACGGA 




1151 


TTCTCTCCAC 
AAGAGAGGTG 


AGGTGTCCAC 
TCCACAGGTG 


TCCCAGTTCA 
AGGGTCAAGT 


ATTACAGCTC 
TAATGTCGAG 


TTAAAAGCTT 
AATTTTCGAA 










Mrt Asp 


Tjr Tjr A/g Up Tjr N* Al* 




1201 


GGTACCGAGC 
CCATGGCTCG 


TCGGATCCGC 
AGCCTAGGCG 


CACCATGGAC 
GTGGTACCTG 


TACTACCGCA 
ATGATGGCGT 


AGTACGCCGC 
TCATGCGGCG 





10. juli 2000 14:58:12 Pag 1 



FIGURE 4 (p. 2/5) 

•' - Ah h Pht Lwi V»l Ttf Lw Sw V<l Pht Ltu Hs Vtl L«i Hs S«r Ah to 
1251 CATCTTCCTG GTGACCCTGA GCGTGTTCCT GCACGTGCTG CACAGCGCCA 

GTAGAAGGAC CACTGGGACT CGCACAAGGA CGTGCACGAC GTGTCGCGGT 

^ Am It 1h YJ to h Thr Vtl Ah Pro Asp Y»l Gin Aip Cji Pro Ou 
1301 ACATCACCGT TAACATCACC GTGGCCCCCG ACGTGCAGGA CTGCCCCGAG 

TGTAGTGGCA ATTGTAGTGG CACCGGGGGC TGCACGTCCT GACGGGGCTC 

«i » Tit In Ha ffli to Pro Fh» Ph» Sw Can Pro S; AU Pra h Lw 
1351 TGCACCCTGC AGGAGAACCC CTTCTTCAGC CAGCCCGGCG CCCCCATCCT 

ACGTGGGACG TCCTCTTGGG GAAGAAGTCG GTCGGGCCGC GGGGGTAGGA 

>l I w tari C|5 Mt( Bl» C58 Phe s«t Arg Ala Ty Pro Thi Pro Ltu Arg S« r- 

1401 GCAGTGCATG GGCTGCTGCT TCAGCCGCGC CTACCCCACC CCCCTGCGCA 

CGTCACGTAC CCGACGACGA AGTCGGCGCG GATGGGGT GG GGGGACGCGT 

•» -Sw Lp Iff Thi M»l Lhi V«l Gin Ip An Vjl Thr "ici Gkj Sff fS Cjs 
14 51 GCAAGAAGAC CATGCTGGTG CAGAAGAACG TGACCAGCGA GAGCACCTGC 

CGTTCTTCTG GTACGACCAC GTCTTCTTGC ACTG GTCGCT CTCGTGGACG 

'' » Vil All ly 5» Ty to fe; Vj Tin Vil MK IS>| Gl; Pli» lp Vrf 
1501 TGCGTGGCCA AGAGCTACAA CCGCGTGACC GTGATGGGCG GCTTCAAGGT 

ACGCACCGGT TCTCGATGTT GGCGCACTGG CACTACCCGC CGAAGTTCCA 

•1 V<l Gfa to Hs Th Ah Cjs Hs Cjp Sw Thr Cp Ty Ty His lp 
1551 GGAGAACCAC ACCGCCTGCC ACTGCAGCAC CTGCTACTAC CACAAGAGCT 

CCTCTTGGTG TGGCGGACGG TGACGTCGTG GACGATGATG GTGTTCTCGA 

1601 AATCTAGAGG GCCCGTTTAA ACCCGCTGAT CAGCCTCGAC TGTGCCTTCT 

TTAGATCTCC CGGGCAAATT TGGGCGACTA GTCGGAGCTG ACACGGAAGA 

1651 AGTTGCCAGC CATCTGTTGT TTGCCCCTCC CCCGTGCCTT CCTTGACCCT 

TCAACGGTCG GTAGACAACA AACGGGGAGG GGGCACGGAA GGAACTGGGA 

1701 GGAAGGTGCC ACTCCCACTG TCCTTTCCTA ATAAAATGAG GAAATTGCAT 

CCTTCCACGG TGAGGGTGAC AGGAAAGGAT TATTTTACTC CTTTAACGTA 

1751 CGCATTGTCT GAGTAGGTGT CATTCTATTC TGGGGGGTGG GGTGGGGCAG 

GCGTAACAGA CTCATCCACA GTAAGATAAG ACCCCCCACC CCACCCCGTC 

1801 GACAGCAAGG GGGAGGATTG GGAAGACAAT AGCAGGCATG CTGGGGATGC 

CTGTCGTTCC CCCTCCTAAC CCTTCTGTTA TCGTCCGTAC GACCCCTACG 

1851 GGTGGGCTCT ATGGCTTCTG AGGCGGAAAG AACCAGCTGG GGCTCTAGGG 

CCACCCGAGA TACCGAAGAC TCCGCCTTTC TTGGTCGACC CCGAGATCC C 

1901 GGTATCCCCA CGCGCCCTGT AGCGGCGCAT TAAGCGCGGC GGGTGTGGTG 

CCATAGGGGT GCGCGGGACA TCGCCGCGTA ATTCGCGCCG CCCACACCAC 

1951 GTTACGCGCA GCGTGACCGC TACACTTGCC AGCGCCCTAG CGCCCGCTCC 

CAATGCGCGT CGCACTGGCG ATG TGAACGG TCGCGGGATC GCGGGCGAGG 

2001 TTTCGCTTTC TTCCCTTCCT TTCTCGCCAC GTTCGCCGGC TTTCCCCGTC 

AAAGCGAAAG AAGGGAAGGA AAGAGCGGTG CAAGCGGCCG AAAGGGGCAG 

2051 AAGCTCTAAA TCGGGGCATC CCTTTAGGGT TCCGATTTAG TGCTTTACGG 

TTCGAGATTT AGCCCCGTAG GGAAATCCCA AGGCTAAATC ACGAAATGCC 

2101 CACCTCGACC CCAAAAAACT TGATTAGGGT GATGGTTCAC GTAGTGGGCC 

GTGGAGCTGG GGTTTTTTGA ACTAATCCCA CTACCAAGTG CATCACCCGG 

2151 ATCGCCCTGA TAGACGGTTT TTCGCCCTTT GACGTTGGAG TCCACGTTCT 

TAGCGGGACT ATCTGCCAAA AAGCGGGAAA CTGCAACCTC AGGTGCAAGA 

2201 TTAATAGTGG ACTCTTGTTC CAAACTGGAA CAACACTCAA CCCTATCTCG 

AATTATCACC TGAGAACAAG GTTTGACCTT GTTGTGAGTT GGGATAGAGC 

2251 GTCTATTCTT TTGATTTATA AGGGATTTTG GGGATTTCGG CCTATTGGTT 

CAGATAAGAA AACTAAATAT TCCCTAAAAC CCCTAAAGCC GGATAACCAA 

2301 AAAAAATGAG CTGATTTAAC AAAAATTTAA CGCGAATTAA TTCTGTGGAA 

TTTTTTACTC GACTAAATTG TTTTTAAATT GCG CTTAATT AAGACACCTT 

2351 TGTGTGTCAG TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GGCAGGCAGA 
ACACACAGTC AATCCCACAC CTTTCAGGGG TCCGAGGGGT CCGTCCGTCT 
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2401 


AG TATGC AAA 
TCATACGTTT 


GCATGCATCT 
CGTACGTAGA 


CAATTAGTCA 
GTTAATCAGT 


GCAACCAGGT 
CGTTGGTCCA 


GTGGAAAGTC 
CACCTTTCAG 




2451 


CCCAGGCTCC 
GGGTCCGAGG 


CCAGCAGGCA 
GGTCGTCCGT 


GAAGTATGCA 
CTTCATACGT 


AAGCATGCAT 
TTCGTACGTA 


CTCAATTAGT 
GAGTTAATCA 




2501 


CAGCAACCAT 
GTCGTTGGTA 


AGTCCCGCCC 
TCAGGGCGGG 


CTAACTCCGC 
GATTGAGGCG 


CCATCCCGCC 
GGTAGGGCGG 


CCTAACTCCG 
GGATTGAGGC 




2551 


CCCAGTTCCG 
GGGTCAAGGC 


CCCATTCTCC 
GGGTAAGAGG 


GCCCCATGGC 
CGGGGTACCG 


TGACTAATTT 
ACTGATTAAA 


TTTTTATTTA 
AAAAATAAAT 




2601 


TGCAGAGGCC 
ACGTCTCCGG 


GAGGCCGCCT 
CTCCGGCGGA 


CTGCCTCTGA 
GACGGAGACT 


GCTATTCCAG 
CGATAAGGTC 


AAGTAGTGAG 
TTCATCACTC 




2651 


GAGGCTTTTT 
CTCCGAAAAA 


TGGAGGCCTA 
ACCTCCGGAT 


GGCTTTTGCA 
CCGAAAACGT 


AAAAGCTCCC 
TTTTCGAGGG 


GGGAGCTTGT 
CCCTCGAACA 




2701 


ATATCCATTT 
TAT AG G T AAA 


TCGGATCTGA 
AGCCTAGACT 


TCAGCACGTG 
AGTCGTGCAC 


ATGAAAAAGC 
TACTTTTTCG 


CTGAACTCAC 
GACTTGAGTG 





2751 


CGCGACGTCT 
GCGCTGCAGA 


GTCGAGAAGT 
CAGCTCTTCA 


TTCTGATCGA 
AAGACTAGCT 


AAAGTTCGAC 
TTTCAAGCTG 


AGCGTCTCCG 
TCGCAGAGGC 




2801 


ACCTGATGCA 
TGGACTACGT 


GCTCTCGGAG 
CGAGAGCCTC 


GGCGAAGAAT 
CCGCTTCTTA 


CTCGTGCTTT 
GAGCACGAAA 


CAGCTTCGAT 
GTCGAAGCTA 




2851 


GTAGGAGGGC 
CATCCTCCCG 


GTGGATATGT 
CACCTATACA 


CCTGCGGGTA 
GGACGCCCAT 


AATAGCTGCG 
TTATCGACGC 


CCGATGGTTT 
GGCTACCAAA 





2901 


CTACAAAGAT 
GATGTTTCTA 


CGTTATGTTT 
GCAATACAAA 


ATCGGCACTT 
TAGCCGTGAA 


TGCATCGGCC 
ACGTAGCCGG 


GCGCTCCCGA 
CGCGAGGGCT 




2951 


TTCCGGAAGT 
AAGGCCTTCA 


GCTTGACATT 
CGAACTGTAA 


GGGGAATTCA 
CCCCTTAAGT 


GCGAGAGCCT 
CGCTCTCGGA 


GACCTATTGC 
CTGGATAACG 




3001 


ATCTCCCGCC 
TAGAGGGCGG 


GTGCACAGGG 
CACGTGTCCC 


TGTCACGTTG 
ACAGTGCAAC 


CAAGACCTGC 
GTTCTGGACG 


CTGAAACCGA 
GACTTTGGCT 




3051 


ACTGCCCGCT 
TGACGGGCGA 


GTTCTGCAGC 
CAAGACGTCG 


CGGTCGCGGA 
GCCAGCGCCT 


GGCCATGGAT 
CCGGTACCTA 


GCGATCGCTG 
CGCTAGCGAC 




3101 


CGGCCGATCT 
GCCGGCTAGA 


TAGCCAGACG 
ATCGGTCTGC 


AGCGGGTTCG 
TCGCCCAAGC 


GCCCATTCGG 
CGGGTAAGCC 


ACCGCAAGGA 
TGGCGTTCCT 




3151 


ATCGGTCAAT 
TAGCCAGTTA 


ACACTACATG 
TGTGATGTAC 


GCGTGATTTC 
CGCACTAAAG 


ATATGCGCGA 
TATACGCGCT 


TTGCTGATCC 
AACGACTAGG 




3201 


CCATGTGTAT 
GGTACACATA 


CACTGGCAAA 
GTGACCGTTT 


CTGTGATGGA 
GACACTACCT 


CGACACCGTC 
GCTGTGGCAG 


AGTGCGTCCG 
TCACGCAGGC 




3251 


TCGCGCAGGC 
AGCGCGTCCG 


TCTCGATGAG 
AGAGCTACTC 


CTGATGCTTT 
GACTACGAAA 


GGGCCGAGGA 
CCCGGCTCCT 


CTGCCCCGAA 
GACGGGGCTT 




3301 


GTCCGGCACC 
CAGGCCGTGG 


TCGTGCACGC 
AGCACGTGCG 


GGATTTCGGC 
CCTAAAGCCG 


TCCAACAATG 
AGGTTGTTAC 


TCCTGACGGA 
AGGACTGCCT 




3351 


CAATGGCCGC 
GTTACCGGCG 


ATAACAGCGG 
TATTGTCGCC 


TCATTGACTG 
AGTAACTGAC 


GAGCGAGGCG 
CTCGCTCCGC 


ATGTTCGGGG 
TACAAGCCCC 




3401 


ATTCCCAATA 
TAAGGGTTAT 


CGAGGTCGCC 
GCTCCAGCGG 


AACATCTTCT 
TTGTAGAAGA 


TCTGGAGGCC 
AGACCTCCGG 


GTGGTTGGCT 
CACCAACCGA 




3451 


TGTATGGAGC 
ACATACCTCG 


AGCAGACGCG 
TCGTCTGCGC 


CTACTTCGAG 
GATGAAGCTC 


CGGAGGCATC 
GCCTCCGTAG 


CGGAGCTTGC 
GCCTCGAACG 




3501 


AGGATCGCCG 
TCCTAGCGGC 


CGGCTCCGGG 
GCCGAGGCCC 


CGTATATGCT 
GCATATACGA 


CCGCATTGGT 
GGCGTAACCA 


CTTGACCAAC 
GAACTGGTTG 




3551 


TCTATCAGAG 
AGATAGTCTC 


CTTGGTTGAC 
GAACCAACTG 


GGCAATTTCG 
CCGTTAAAGC 


ATGATGCAGC 
TACTACGTCG 


TTGGGCGCAG 
AACCCGCGTC 




3601 


GGTCGATGCG 
CCAGCTACGC 


ACGCAATCGT 
TGCGTTAGCA 


CCGATCCGGA 
GGCTAGGCCT 


GCCGGGACTG 
CGGCCCTGAC 


TCGGGCGTAC 
AGCCCGCATG 




3651 


ACAAATCGCC 
TGTTTAGCGG 


CGCAGAAGCG 
GCGTCTTCGC 


CGGCCGTCTG 
GCCGGCAGAC 


GACCGATGGC 
CTGGCTACCG 


TGTGTAGAAG 
ACACATCTTC 
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3701 


TACTCGCCGA 
ATGAGCGGCT 


TAGTGGAAAC 
ATCACCTTTG 


CGACGCCCCA 
GCTGCGGGGT 


GCACTCGTCC 
CGTGAGCAGG 


GAGGGCAAAG 
CTCCCGTTTC 




3751 


GAATAGCACG 
CTTATCGTGC 


TGCTACGAGA 
ACGATGCTCT 


TTTCGATTCC 
AAAGCTAAGG 


ACCGCCGCCT 
TGGCGGCGGA 


TCTATGAAAG 
AGATACTTTC 




3801 


GTTGGGCTTC 
CAACCCGAAG 


GGAATCGTTT 
CCTTAGCAAA 


TCCGGGACGC 
AGGCCCTGCG 


CGGCTGGATG 
GCCGACCTAC 


ATCCTCCAGC 
TAGGAGGTCG 




3851 


GCGGGGATCT 
CGCCCCTAGA 


CATGCTGGAG 
GTACGACCTC 


TTCTTCGCCC 
AAGAAGCGGG 


ACCCCAACTT 
TGGGGTTGAA 


GTTTATTGCA 
CAAATAACGT 




3901 


GCTTATAATG 
CGAATATTAC 


GTTACAAATA 
CAATGTTTAT 


AAGCAATAGC 
TTCGTTATCG 


ATCACAAATT 
TAGTGTTTAA 


TCACAAATAA 
AGTGTTTATT 





3951 


AGCATTTTTT 
TCGTAAAAAA 


TCACTGCATT 
AGTGACGTAA 


CTAGTTGTGG 
GATCAACACC 


TTTGTCCAAA 
AAACAGGTTT 


CTCATCAATG 
GAGTAGTTAC 




4001 


TATCTTATCA 
ATAGAATAGT 


TGTCTGTATA 
ACAGACATAT 


CCGTCGACCT 
GGCAGCTGGA 


CTAGCTAGAG 
GATCGATCTC 


CTTGGCGTAA 
GAACCGCATT 





4051 


TCATGGTCAT 
AGTACCAGTA 


AGCTGTTTCC 
TCGACAAAGG 


TGTGTGAAAT 
ACACACTTTA 


TGTTATCCGC 
ACAATAGGCG 


TCACAATTCC 
AGTGTTAAGG 




4101 


ACACAACATA 
TGTGTTGTAT 


CGAGCCGGAA 
GCTCGGCCTT 


GCATAAAGTG 
CGTATTTCAC 


TAAAGCCTGG 
ATTTCGGACC 


GGTGCCTAAT 
CCACGGATTA 




4151 


GAGTGAGCTA 
CTCACTCGAT 


ACTCACATTA 
TGAGTGTAAT 


ATTGCGTTGC 
TAACGCAACG 


GCTCACTGCC 
CGAGTGACGG 


CGCTTTCCAG 
GCGAAAGGTC 




4201 


TCGGGAAACC 
AGCCCTTTGG 


TGTCGTGCCA 
ACAGCACGGT 


GCTGCATTAA 
CGACGTAATT 


TGAATCGGCC 
ACTTAGCCGG 


AACGCGCGGG 
TTGCGCGCCC 




4251 


GAGAGGCGGT 
CTCTCCGCCA 


TTGCGTATTG 
AACGCATAAC 


GGCGCTCTTC 
CCGCGAGAAG 


CGCTTCCTCG 
GCGAAGGAGC 


CTCACTGACT 
GAGTGACTGA 




4301 


CGCTGCGCTC 
GCGACGCGAG 


GGTCGTTCGG 
CCAGCAAGCC 


CTGCGGCGAG 
GACGCCGCTC 


CGGTATCAGC 
GCCATAGTCG 


TCACTCAAAG 
AGTGAGTTTC 




4351 


GCGGTAATAC 
CGCCATTATG 


GGTTATCCAC 
CCAATAGGTG 


AGAATCAGGG 
TCTTAGTCCC 


GATAACGCAG 
CTATTGCGTC 


GAAAGAACAT 
CTTTCTTGTA 




4401 


GTGAGCAAAA 
CACTCGTTTT 


GGCCAGCAAA 
CCGGTCGTTT 


AGGCCAGGAA 
TCCGGTCCTT 


CCGTAAAAAG 
GGCATTTTTC 


GCCGCGTTGC 
CGGCGCAACG 




4451 


TGGCGTTTTT 
ACCGCAAAAA 


CCATAGGCTC 
GGTATCCGAG 


CGCCCCCCTG 
GCGGGGGGAC 


ACGAGCATCA 
TGCTCGTAGT 


CAAAAATCGA 
GTTTTTAGCT 




4501 


CGCTCAAGTC 
GCGAGTTCAG 


AGAGGTGGCG 
TCTCCACCGC 


AAACCCGACA 
TTTGGGCTGT 


GGACTATAAA 
CCTGATATTT 


GATACCAGGC 
CTATGGTCCG 




4551 


GTTTCCCCCT 
CAAAGGGGGA 


GGAAGCTCCC 
CCTTCGAGGG 


TCGTGCGCTC 
AGCACGCGAG 


TCCTGTTCCG 
AGGACAAGGC 


ACCCTGCCGC 
TGGGACGGCG 




4601 


TTACCGGATA 
AATGGCCTAT 


CCTGTCCGCC 
GGACAGGCGG 


TTTCTCCCTT 
AAAGAGGGAA 


CGGGAAGCGT 
GCCCTTCGCA 


GGCGCTTTCT 
CCGCGAAAGA 




4651 


CAATGCTCAC 
GTTACGAGTG 


GCTGTAGGTA 
CGACATCCAT 


TCTCAGTTCG 
AGAGTCAAGC 


GTGTAGGTCG 
CACATCCAGC 


TTCGCTCCAA 
AAGCGAGGTT 




4701 


GCTGGGCTGT 
CGACCCGACA 


GTGCACGAAC 
CACGTGCTTG 


CCCCCGTTCA 
GGGGGCAAGT 


GCCCGACCGC 
CGGGCTGGCG 


TGCGCCTTAT 
ACGCGGAATA 




4751 


CCGGTAACTA 
GGCCATTGAT 


TCGTCTTGAG 
AGCAGAACTC 


TCCAACCCGG 
AGGTTGGGCC 


TAAGACACGA 
ATTCTGTGCT 


CTTATCGCCA 
GAATAGCGGT 




4801 


CTGGCAGCAG 
GACCGTCGTC 


CCACTGGTAA 
GGTGACCATT 


CAGGATTAGC 
GTCCTAATCG 


AGAGCGAGGT 
TCTCGCTCCA 


ATGTAGGCGG 
TACATCCGCC 




4851 


TGCTACAGAG 
ACGATGTCTC 


TTCTTGAAGT 
AAGAACTTCA 


GGTGGCCTAA 
CCACCGGATT 


CTACGGCTAC 
GATGCCGATG 


ACTAGAAGGA 
TGATCTTCCT 




4901 


CAGTATTTGG 
GTCATAAACC 


TATCTGCGCT 
ATAGACGCGA 


CTGCTGAAGC 
GACGACTTCG 


CAGTTACCTT 
GTCAATGGAA 


CGGAAAAAGA 
GCCTTTTTCT 




4951 


GTTGGTAGCT 
CAACCATCGA 


CTTGATCCGG 
GAACTAGGCC 


CAAACAAACC 
GTTTGTTTGG 


ACCGCTGGTA 
TGGCGACCAT 


GCGGTGGTTT 
CGCCACCAAA 
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TTTTGTTTGC 
AAAACAAACG 


AAGCAGCAGA 
TTCGTCGTCT 


TTACGCGCAG 
AATGCGCGTC 


AAAAAAAGGA 
TTTTTTTCCT 


TCTCAAGAAG 
AGAGTTCTTC 




ATCCTTTGAT 
TAGGAAACTA 


CTTTTC TACG 
GAAAAGATGC 


GGGTCTGACG 
CCCAGACTGC 


CTCAGTGGAA 
GAGTCACCTT 


CGAAAACTCA 
GCTTTTGAGT 




CGTTAAGGGA 
GCAATTCCCT 


TTTTGGTCAT 
AAAACCAGTA 


GAGATTATCA 
CTCTAATAGT 


AAAAGGATCT 
TTTTCCTAGA 


TCACCTAGAT 
AGTGGATCTA 




CCTTTTAAAT 
GGAAAATTTA 


TAAAAATGAA 
ATTTTTACTT 


GTTTTAAATC 
CAAAATTTAG 


AATCTAAAGT 
TTAGATTTCA 


ATATATGAGT 
TATATACTCA 




AAAC TTGGTC 
TTTGAACCAG 


TGACAGTTAC 
ACTGTCAATG 


CAATGCTTAA 
GTTACGAATT 


TCAGTGAGGC 
AGTCACTCCG 


ACCTATCTCA 
TGGATAGAGT 




GCGATCTGTC 
CGCTAGACAG 


ATAAAGCAAG 


ATC C AT AGTT 
TAGGTATCAA 


GCCTGACTCC 
CGGACTGAGG 


CCGTCGTGTA 
GGCAGCACAT 




GATAACTACG 
CTATTGATGC 


ATACGGGAGG 
TATGCCCTCC 


GCTTACCATC 
CGAATGGTAG 


TGGCCCCAGT 
ACCGGGGTCA 


GCTGCAATGA 
CGACGTTACT 




TACCGCGAGA 
ATGGCGCTCT 


CCCACGCTCA 
GGGTGCGAGT 


CCGGCTCCAG 
GGCCGAGGTC 


ATTTATCAGC 
TAAATAGTCG 


AATAAACCAG 
TTATTTGGTC 




CCAGCCGGAA 
GGTCGGCCTT 


GGGCCGAGCG 
CCCGGCTCGC 


CAGAAGTGGT 
GTCTTCACCA 


CCTGCAACTT 
GGACGTTGAA 


TATCCGCC TC 
ATAGGCGGAG 




C A TC CAG TC T 
GTAGGTCAGA 


ATTAATTGTT 
TAATTAACAA 


GCCGGGAAGC 
CGGCCCTTCG 


TAGAGTAAGT 
ATCTCATTCA 


AGTTCGCCAG 
TCAAGCGGTC 




AATTATCAAA 


GCGCAACGTT 
CGCGTTGCAA 


GTTGCCATTG 
CAACGGTAAC 


CTACAGGCAT 
GATGTCCGTA 


CGTGGTGTCA 
GCACCACAGT 




CGCTCGTCGT 
GCGAGCAGCA 


TTGGTATGGC 
AACCATACCG 


TTCAT TC AGC 
AAGTAAGTCG 


TCCGGTTCCC 
AGGCCAAGGG 


AACGATCAAG 
TTGCTAGTTC 




CGCTCAATGT 


TGATCCCCCA 
ACTAGGGGGT 


TGTTGTGCAA 
ACAACACGTT 


AAAAGCGGTT 
TTTTCGCCAA 


AGCTCCTTCG 
TCGAGGAAGC 




GTCCTCCGAT 
CAGGAGGCTA 


CGTTGTCAGA 
GCAACAGTCT 


AGTAAGTTGG 
TCATTCAACC 


CCGCAGTGTT 
GGCGTCACAA 


ATCACTCAT G 
TAGTGAGTAC 




GTTATGGCAG 
CAATACCGTC 


GTGACGTATT 


AAGAGAATGA 


GTCATGCCAT 
CAGTACGGTA 


CCGTAAGATG 
GGCATTCTAC 




CTTTTCTGTG 
GAAAAGACAC 


At, i kjo I IjAO 1 

TGACCACTCA 


ACTCAACCAA 
TGAGTTGGTT 


GTCATTCTGA 
CAGTAAGACT 


GAATAGTGTA 
CTTATCACAT 




TGCGGCGACC 
ACGCCGCTGG 


GAGTTGCTCT 
CTCAACGAGA 


ACGGGCCGCA 


GTTATGCCCT 


TAATACCGCG 
ATTATGGCGC 




CCACATAGCA 
GGTGTATCGT 


GAACTTTAAA 
CTTGAAATTT 


AGTGCTCATC 
TCACGAGTAG 


ATT GG AAAAC 
TAACCTTTTG 


CAAGAAGCCC 




GCGAAAACTC 
CGCTTTTGAG 


TCAAGGATCT 
AGTTCCTAGA 


ATGGCGACAA 


CTCTAGGTCA 


TCGATGTAAC 
AGCTACATTG 




GGTGAGCACG 


ACCCAACTGA 
TGGGTTGACT 


TCTTCAGCAT 
AGAAGTCGTA 


GAAAATGAAA 


CACCAGCGTT 
GTGGTCGCAA 




AGACCCACTC 


CAAAAACAGG 
GTTTTTGTCC 


AAG GC AAAAT 
TTCCGTTTTA 


GCCGCAAAAA 
CGGCGTTTTT 


AGGGAATAAG 
TCCCTTATTC 




GGCGACACGG 
CCGCTGTGCC 


AAATGTTGAA 
TTTACAACTT 


ATGAGTATGA 


GAAGGAAAAA 


CAATATTATT 
GTTATAATAA 




GAAGCATTTA 
CTTCGTAAAT 


TCAGGGTTAT 
AGTCCCAATA 


TGTCTCATGA 
ACAGAGTACT 


GCGGATACAT 
CGCCTATGTA 


ATTTGAATGT 
TAAACTTACA 


6151 


ATTTAGAAAA 
TAAATCTTTT 


ATAAACAAAT 
TATTTGTTTA 


AGGGGTTCCG 
TCCCCAAGGC 


CGCACATTTC 
GCGTGTAAAG 


CCCGAAAAGT 
GGGCTTTTCA 


6201 


GCCACCTGAC 
CGGTGGACTG 


GTC 
CAG 
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